* [BUG] 2.6.24-rc3-git2 softlockup detected
@ 2007-11-28 6:29 Kamalesh Babulal
2007-11-28 6:53 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Kamalesh Babulal @ 2007-11-28 6:29 UTC (permalink / raw)
To: LKML, linuxppc-dev, Andy Whitcroft, Balbir Singh
Hi,
Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
REGS: c00000077cbef0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git2-autotest)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1
GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014
GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54
GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018
GPR12: d000000000143610 c000000000473d00
NIP [c00000000002f02c] .ioread8+0x14/0x60
LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
Call Trace:
[c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable)
[c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
[c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
[c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0
[c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c
[c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154
[c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4
[c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40
[c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228
[c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0
[c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc
[c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
[c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
[c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-28 6:29 [BUG] 2.6.24-rc3-git2 softlockup detected Kamalesh Babulal
@ 2007-11-28 6:53 ` Andrew Morton
2007-11-28 7:17 ` Kamalesh Babulal
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-11-28 6:53 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: linux-scsi, LKML, Rafael J. Wysocki, linuxppc-dev, Andy,
Balbir Singh
On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> Hi,
(cc linux-scsi, for sym53c8xx)
> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
I assume this is a post-2.6.23 regression?
> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
> REGS: c00000077cbef0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git2-autotest)
> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
> TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1
> GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014
> GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54
> GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018
> GPR12: d000000000143610 c000000000473d00
> NIP [c00000000002f02c] .ioread8+0x14/0x60
> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> Call Trace:
> [c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable)
> [c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> [c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
> [c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0
> [c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c
> [c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154
> [c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4
> [c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40
> [c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228
> [c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0
> [c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc
> [c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
> [c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
> [c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40
> Instruction dump:
> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
I see no obvious lockup sites near the end of sym_hcb_attach(). Maybe it's
being called lots of times from a higher level.. Do the traces all look
the same?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-28 6:53 ` Andrew Morton
@ 2007-11-28 7:17 ` Kamalesh Babulal
2007-11-28 7:25 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Kamalesh Babulal @ 2007-11-28 7:17 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-scsi, LKML, Rafael J. Wysocki, linuxppc-dev, Balbir Singh
Andrew Morton wrote:
> On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>
>> Hi,
>
> (cc linux-scsi, for sym53c8xx)
>
>> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
>
> I assume this is a post-2.6.23 regression?
>
>> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
>> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
>> REGS: c00000077cbef0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git2-autotest)
>> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
>> TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1
>> GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014
>> GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54
>> GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018
>> GPR12: d000000000143610 c000000000473d00
>> NIP [c00000000002f02c] .ioread8+0x14/0x60
>> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>> Call Trace:
>> [c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable)
>> [c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>> [c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
>> [c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0
>> [c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c
>> [c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154
>> [c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4
>> [c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40
>> [c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228
>> [c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0
>> [c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc
>> [c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
>> [c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
>> [c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40
>> Instruction dump:
>> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
>> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
>
> I see no obvious lockup sites near the end of sym_hcb_attach(). Maybe it's
> being called lots of times from a higher level.. Do the traces all look
> the same?
Hi Andrew,
I see this call trace twice and both looks similar and on another reboot
the following trace is seen twice in different cpu
BUG: soft lockup detected on CPU#3!
Call Trace:
[C00000003FEDEDA0] [C000000000010220] .show_stack+0x68/0x1b0 (unreliable)
[C00000003FEDEE40] [C0000000000A061C] .softlockup_tick+0xf0/0x13c
[C00000003FEDEEF0] [C000000000072E2C] .run_local_timers+0x1c/0x30
[C00000003FEDEF70] [C000000000022FA0] .timer_interrupt+0xa8/0x488
[C00000003FEDF050] [C0000000000034EC] decrementer_common+0xec/0x100
--- Exception: 901 at .ioread8+0x14/0x60
LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
[C00000003FEDF340] [D0000000002B3BC0] 0xd0000000002b3bc0 (unreliable)
[C00000003FEDF3B0] [D00000000029A3C0] .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
[C00000003FEDF480] [D000000000291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
[C00000003FEDF710] [C0000000001B65A4] .pci_device_probe+0x13c/0x1dc
[C00000003FEDF7D0] [C000000000219A0C] .driver_probe_device+0xa0/0x15c
[C00000003FEDF870] [C000000000219C64] .__driver_attach+0xb4/0x138
[C00000003FEDF900] [C00000000021913C] .bus_for_each_dev+0x7c/0xd4
[C00000003FEDF9C0] [C0000000002198B0] .driver_attach+0x28/0x40
[C00000003FEDFA40] [C000000000218BA4] .bus_add_driver+0x98/0x18c
[C00000003FEDFAE0] [C00000000021A064] .driver_register+0xa8/0xc4
[C00000003FEDFB60] [C0000000001B68AC] .__pci_register_driver+0x5c/0xa4
[C00000003FEDFBF0] [D00000000029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
[C00000003FEDFC90] [C00000000008D1F4] .sys_init_module+0x1764/0x1998
[C00000003FEDFE30] [C00000000000869C] syscall_exit+0x0/0x40
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-28 7:17 ` Kamalesh Babulal
@ 2007-11-28 7:25 ` Andrew Morton
2007-11-29 6:31 ` Kamalesh Babulal
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-11-28 7:25 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: linux-scsi, LKML, Rafael J. Wysocki, linuxppc-dev, Andy,
Balbir Singh
On Wed, 28 Nov 2007 12:47:19 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> Andrew Morton wrote:
> > On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> >
> >> Hi,
> >
> > (cc linux-scsi, for sym53c8xx)
> >
> >> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
> >
> > I assume this is a post-2.6.23 regression?
> >
> >> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
> >> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
> >> REGS: c00000077cbef0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git2-autotest)
> >> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
> >> TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1
> >> GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014
> >> GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54
> >> GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018
> >> GPR12: d000000000143610 c000000000473d00
> >> NIP [c00000000002f02c] .ioread8+0x14/0x60
> >> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> >> Call Trace:
> >> [c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable)
> >> [c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> >> [c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
> >> [c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0
> >> [c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c
> >> [c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154
> >> [c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4
> >> [c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40
> >> [c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228
> >> [c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0
> >> [c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc
> >> [c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
> >> [c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
> >> [c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40
> >> Instruction dump:
> >> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
> >> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
> >
> > I see no obvious lockup sites near the end of sym_hcb_attach(). Maybe it's
> > being called lots of times from a higher level.. Do the traces all look
> > the same?
>
> Hi Andrew,
>
> I see this call trace twice and both looks similar and on another reboot
> the following trace is seen twice in different cpu
>
> BUG: soft lockup detected on CPU#3!
> Call Trace:
> [C00000003FEDEDA0] [C000000000010220] .show_stack+0x68/0x1b0 (unreliable)
> [C00000003FEDEE40] [C0000000000A061C] .softlockup_tick+0xf0/0x13c
> [C00000003FEDEEF0] [C000000000072E2C] .run_local_timers+0x1c/0x30
> [C00000003FEDEF70] [C000000000022FA0] .timer_interrupt+0xa8/0x488
> [C00000003FEDF050] [C0000000000034EC] decrementer_common+0xec/0x100
> --- Exception: 901 at .ioread8+0x14/0x60
> LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
> [C00000003FEDF340] [D0000000002B3BC0] 0xd0000000002b3bc0 (unreliable)
> [C00000003FEDF3B0] [D00000000029A3C0] .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
> [C00000003FEDF480] [D000000000291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
> [C00000003FEDF710] [C0000000001B65A4] .pci_device_probe+0x13c/0x1dc
> [C00000003FEDF7D0] [C000000000219A0C] .driver_probe_device+0xa0/0x15c
> [C00000003FEDF870] [C000000000219C64] .__driver_attach+0xb4/0x138
> [C00000003FEDF900] [C00000000021913C] .bus_for_each_dev+0x7c/0xd4
> [C00000003FEDF9C0] [C0000000002198B0] .driver_attach+0x28/0x40
> [C00000003FEDFA40] [C000000000218BA4] .bus_add_driver+0x98/0x18c
> [C00000003FEDFAE0] [C00000000021A064] .driver_register+0xa8/0xc4
> [C00000003FEDFB60] [C0000000001B68AC] .__pci_register_driver+0x5c/0xa4
> [C00000003FEDFBF0] [D00000000029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
> [C00000003FEDFC90] [C00000000008D1F4] .sys_init_module+0x1764/0x1998
> [C00000003FEDFE30] [C00000000000869C] syscall_exit+0x0/0x40
>
hm, odd.
Can you look up sym_hcb_attach+0x1194/0x1384 in gdb? Something like
- Enable CONFIG_DEBUG_INFO
- gdb sym53c8xx.o
(gdb) p sym_hcb_attach
<prints 0xsomething>
(gdb) p/x 0xsomething + 0x1194
<prints 0xsomethingelse>
(gdb) l *0xsomethingelse
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-28 7:25 ` Andrew Morton
@ 2007-11-29 6:31 ` Kamalesh Babulal
2007-11-29 8:35 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Kamalesh Babulal @ 2007-11-29 6:31 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-scsi, LKML, Rafael J. Wysocki, linuxppc-dev, Balbir Singh
Andrew Morton wrote:
> On Wed, 28 Nov 2007 12:47:19 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>
>> Andrew Morton wrote:
>>> On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>>>
>>>> Hi,
>>> (cc linux-scsi, for sym53c8xx)
>>>
>>>> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
>>> I assume this is a post-2.6.23 regression?
>>>
>>>> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
>>>> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
>>>> REGS: c00000077cbef0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git2-autotest)
>>>> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
>>>> TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1
>>>> GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014
>>>> GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54
>>>> GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018
>>>> GPR12: d000000000143610 c000000000473d00
>>>> NIP [c00000000002f02c] .ioread8+0x14/0x60
>>>> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>>>> Call Trace:
>>>> [c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable)
>>>> [c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>>>> [c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
>>>> [c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0
>>>> [c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c
>>>> [c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154
>>>> [c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4
>>>> [c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40
>>>> [c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228
>>>> [c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0
>>>> [c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc
>>>> [c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
>>>> [c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
>>>> [c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40
>>>> Instruction dump:
>>>> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
>>>> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
>>> I see no obvious lockup sites near the end of sym_hcb_attach(). Maybe it's
>>> being called lots of times from a higher level.. Do the traces all look
>>> the same?
>> Hi Andrew,
>>
>> I see this call trace twice and both looks similar and on another reboot
>> the following trace is seen twice in different cpu
>>
>> BUG: soft lockup detected on CPU#3!
>> Call Trace:
>> [C00000003FEDEDA0] [C000000000010220] .show_stack+0x68/0x1b0 (unreliable)
>> [C00000003FEDEE40] [C0000000000A061C] .softlockup_tick+0xf0/0x13c
>> [C00000003FEDEEF0] [C000000000072E2C] .run_local_timers+0x1c/0x30
>> [C00000003FEDEF70] [C000000000022FA0] .timer_interrupt+0xa8/0x488
>> [C00000003FEDF050] [C0000000000034EC] decrementer_common+0xec/0x100
>> --- Exception: 901 at .ioread8+0x14/0x60
>> LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
>> [C00000003FEDF340] [D0000000002B3BC0] 0xd0000000002b3bc0 (unreliable)
>> [C00000003FEDF3B0] [D00000000029A3C0] .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
>> [C00000003FEDF480] [D000000000291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
>> [C00000003FEDF710] [C0000000001B65A4] .pci_device_probe+0x13c/0x1dc
>> [C00000003FEDF7D0] [C000000000219A0C] .driver_probe_device+0xa0/0x15c
>> [C00000003FEDF870] [C000000000219C64] .__driver_attach+0xb4/0x138
>> [C00000003FEDF900] [C00000000021913C] .bus_for_each_dev+0x7c/0xd4
>> [C00000003FEDF9C0] [C0000000002198B0] .driver_attach+0x28/0x40
>> [C00000003FEDFA40] [C000000000218BA4] .bus_add_driver+0x98/0x18c
>> [C00000003FEDFAE0] [C00000000021A064] .driver_register+0xa8/0xc4
>> [C00000003FEDFB60] [C0000000001B68AC] .__pci_register_driver+0x5c/0xa4
>> [C00000003FEDFBF0] [D00000000029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
>> [C00000003FEDFC90] [C00000000008D1F4] .sys_init_module+0x1764/0x1998
>> [C00000003FEDFE30] [C00000000000869C] syscall_exit+0x0/0x40
>>
>
> hm, odd.
>
> Can you look up sym_hcb_attach+0x1194/0x1384 in gdb? Something like
>
Hi Andrew,
I tried with 2.6.24-rc3-git3 and got the following trace
BUG: soft lockup - CPU#2 stuck for 11s! [insmod:375]
NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
REGS: c00000077ca3b0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git3-autokern1)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
TASK = c00000077cc58000[375] 'insmod' THREAD: c00000077ca38000 CPU: 2
GPR00: d0000000001414fc c00000077ca3b330 c00000000052b880 d000080080002014
GPR04: d00008008000202c 0000000000000000 c00000077c82eb00 d00000000014ce54
GPR08: c00000077c82e63c 0000000000000000 000000000000002a c00000000002f018
GPR12: d000000000143610 c000000000473f80
NIP [c00000000002f02c] .ioread8+0x14/0x60
LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
Call Trace:
[c00000077ca3b330] [c00000077ca3b3c0] 0xc00000077ca3b3c0 (unreliable)
[c00000077ca3b3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
[c00000077ca3b470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
[c00000077ca3b710] [c0000000001bc098] .pci_device_probe+0x124/0x1b0
[c00000077ca3b7b0] [c0000000002210c4] .driver_probe_device+0x144/0x20c
[c00000077ca3b850] [c0000000002213dc] .__driver_attach+0xcc/0x154
[c00000077ca3b8e0] [c00000000021ff20] .bus_for_each_dev+0x7c/0xd4
[c00000077ca3b9a0] [c000000000220e28] .driver_attach+0x28/0x40
[c00000077ca3ba20] [c000000000220464] .bus_add_driver+0x90/0x228
[c00000077ca3bac0] [c0000000002217e4] .driver_register+0x94/0xb0
[c00000077ca3bb40] [c0000000001bc3b0] .__pci_register_driver+0x6c/0xcc
[c00000077ca3bbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
[c00000077ca3bc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
[c00000077ca3be30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
The gdb list the following for the above trace
0xa4fc is in sym_hcb_attach (drivers/scsi/sym53c8xx_2/sym_hipd.c:1041).
1036 OUTL_DSP(np, pc);
1037 /*
1038 * Wait 'til done (with timeout)
1039 */
1040 for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
1041 if (INB(np, nc_istat) & (INTF|SIP|DIP))
1042 break;
1043 if (i>=SYM_SNOOP_TIMEOUT) {
1044 printf ("CACHE TEST FAILED: timeout.\n");
1045 return (0x20);
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-29 6:31 ` Kamalesh Babulal
@ 2007-11-29 8:35 ` Andrew Morton
2007-11-29 9:13 ` Kamalesh Babulal
2007-11-30 6:39 ` Kyle McMartin
0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2007-11-29 8:35 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: linux-scsi, Matthew, Wilcox, LKML, Rafael J. Wysocki,
linuxppc-dev, Andy, Balbir Singh
On Thu, 29 Nov 2007 12:01:08 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> Andrew Morton wrote:
> > On Wed, 28 Nov 2007 12:47:19 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> >
> >> Andrew Morton wrote:
> >>> On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> >>>
> >>>> Hi,
> >>> (cc linux-scsi, for sym53c8xx)
> >>>
> >>>> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
> >>> I assume this is a post-2.6.23 regression?
> >>>
> >>>> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
> >>>> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
> >>>> REGS: c00000077cbef0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git2-autotest)
> >>>> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
> >>>> TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1
> >>>> GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014
> >>>> GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54
> >>>> GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018
> >>>> GPR12: d000000000143610 c000000000473d00
> >>>> NIP [c00000000002f02c] .ioread8+0x14/0x60
> >>>> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> >>>> Call Trace:
> >>>> [c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable)
> >>>> [c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> >>>> [c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
> >>>> [c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0
> >>>> [c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c
> >>>> [c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154
> >>>> [c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4
> >>>> [c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40
> >>>> [c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228
> >>>> [c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0
> >>>> [c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc
> >>>> [c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
> >>>> [c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
> >>>> [c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40
> >>>> Instruction dump:
> >>>> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
> >>>> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
> >>> I see no obvious lockup sites near the end of sym_hcb_attach(). Maybe it's
> >>> being called lots of times from a higher level.. Do the traces all look
> >>> the same?
> >> Hi Andrew,
> >>
> >> I see this call trace twice and both looks similar and on another reboot
> >> the following trace is seen twice in different cpu
> >>
> >> BUG: soft lockup detected on CPU#3!
> >> Call Trace:
> >> [C00000003FEDEDA0] [C000000000010220] .show_stack+0x68/0x1b0 (unreliable)
> >> [C00000003FEDEE40] [C0000000000A061C] .softlockup_tick+0xf0/0x13c
> >> [C00000003FEDEEF0] [C000000000072E2C] .run_local_timers+0x1c/0x30
> >> [C00000003FEDEF70] [C000000000022FA0] .timer_interrupt+0xa8/0x488
> >> [C00000003FEDF050] [C0000000000034EC] decrementer_common+0xec/0x100
> >> --- Exception: 901 at .ioread8+0x14/0x60
> >> LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
> >> [C00000003FEDF340] [D0000000002B3BC0] 0xd0000000002b3bc0 (unreliable)
> >> [C00000003FEDF3B0] [D00000000029A3C0] .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
> >> [C00000003FEDF480] [D000000000291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
> >> [C00000003FEDF710] [C0000000001B65A4] .pci_device_probe+0x13c/0x1dc
> >> [C00000003FEDF7D0] [C000000000219A0C] .driver_probe_device+0xa0/0x15c
> >> [C00000003FEDF870] [C000000000219C64] .__driver_attach+0xb4/0x138
> >> [C00000003FEDF900] [C00000000021913C] .bus_for_each_dev+0x7c/0xd4
> >> [C00000003FEDF9C0] [C0000000002198B0] .driver_attach+0x28/0x40
> >> [C00000003FEDFA40] [C000000000218BA4] .bus_add_driver+0x98/0x18c
> >> [C00000003FEDFAE0] [C00000000021A064] .driver_register+0xa8/0xc4
> >> [C00000003FEDFB60] [C0000000001B68AC] .__pci_register_driver+0x5c/0xa4
> >> [C00000003FEDFBF0] [D00000000029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
> >> [C00000003FEDFC90] [C00000000008D1F4] .sys_init_module+0x1764/0x1998
> >> [C00000003FEDFE30] [C00000000000869C] syscall_exit+0x0/0x40
> >>
> >
> > hm, odd.
> >
> > Can you look up sym_hcb_attach+0x1194/0x1384 in gdb? Something like
> >
> Hi Andrew,
>
> I tried with 2.6.24-rc3-git3 and got the following trace
>
> BUG: soft lockup - CPU#2 stuck for 11s! [insmod:375]
> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
> REGS: c00000077ca3b0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git3-autokern1)
> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
> TASK = c00000077cc58000[375] 'insmod' THREAD: c00000077ca38000 CPU: 2
> GPR00: d0000000001414fc c00000077ca3b330 c00000000052b880 d000080080002014
> GPR04: d00008008000202c 0000000000000000 c00000077c82eb00 d00000000014ce54
> GPR08: c00000077c82e63c 0000000000000000 000000000000002a c00000000002f018
> GPR12: d000000000143610 c000000000473f80
> NIP [c00000000002f02c] .ioread8+0x14/0x60
> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>
> Call Trace:
> [c00000077ca3b330] [c00000077ca3b3c0] 0xc00000077ca3b3c0 (unreliable)
> [c00000077ca3b3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
> [c00000077ca3b470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
> [c00000077ca3b710] [c0000000001bc098] .pci_device_probe+0x124/0x1b0
> [c00000077ca3b7b0] [c0000000002210c4] .driver_probe_device+0x144/0x20c
> [c00000077ca3b850] [c0000000002213dc] .__driver_attach+0xcc/0x154
> [c00000077ca3b8e0] [c00000000021ff20] .bus_for_each_dev+0x7c/0xd4
> [c00000077ca3b9a0] [c000000000220e28] .driver_attach+0x28/0x40
> [c00000077ca3ba20] [c000000000220464] .bus_add_driver+0x90/0x228
> [c00000077ca3bac0] [c0000000002217e4] .driver_register+0x94/0xb0
> [c00000077ca3bb40] [c0000000001bc3b0] .__pci_register_driver+0x6c/0xcc
> [c00000077ca3bbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
> [c00000077ca3bc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
> [c00000077ca3be30] [c00000000000872c] syscall_exit+0x0/0x40
>
> Instruction dump:
> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
>
> The gdb list the following for the above trace
>
> 0xa4fc is in sym_hcb_attach (drivers/scsi/sym53c8xx_2/sym_hipd.c:1041).
> 1036 OUTL_DSP(np, pc);
> 1037 /*
> 1038 * Wait 'til done (with timeout)
> 1039 */
> 1040 for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
> 1041 if (INB(np, nc_istat) & (INTF|SIP|DIP))
> 1042 break;
> 1043 if (i>=SYM_SNOOP_TIMEOUT) {
> 1044 printf ("CACHE TEST FAILED: timeout.\n");
> 1045 return (0x20);
>
doh, I missed that.
#define SYM_SNOOP_TIMEOUT (10000000)
ten million is close enough to infinity for me to assume that we broke the
driver and that's never going to terminate.
otoh, if that's true you should have got the "CACHE TEST FAILED: timeout"
message. Did you? And does the driver actually work OK after this?
If it is indeed expected that a ~10 second stall in there is correct
behaviour then all we need to do is do make that loop a bit smarter (10,000
msleep(1)'s, for example).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-29 8:35 ` Andrew Morton
@ 2007-11-29 9:13 ` Kamalesh Babulal
2007-11-30 6:39 ` Kyle McMartin
1 sibling, 0 replies; 15+ messages in thread
From: Kamalesh Babulal @ 2007-11-29 9:13 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-scsi, Matthew Wilcox, LKML, Rafael J. Wysocki, linuxppc-dev,
Balbir Singh
Andrew Morton wrote:
> On Thu, 29 Nov 2007 12:01:08 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>
>> Andrew Morton wrote:
>>> On Wed, 28 Nov 2007 12:47:19 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>>>
>>>> Andrew Morton wrote:
>>>>> On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>>>>>
>>>>>> Hi,
>>>>> (cc linux-scsi, for sym53c8xx)
>>>>>
>>>>>> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
>>>>> I assume this is a post-2.6.23 regression?
>>>>>
>>>>>> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
>>>>>> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
>>>>>> REGS: c00000077cbef0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git2-autotest)
>>>>>> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
>>>>>> TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1
>>>>>> GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014
>>>>>> GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54
>>>>>> GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018
>>>>>> GPR12: d000000000143610 c000000000473d00
>>>>>> NIP [c00000000002f02c] .ioread8+0x14/0x60
>>>>>> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>>>>>> Call Trace:
>>>>>> [c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable)
>>>>>> [c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>>>>>> [c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
>>>>>> [c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0
>>>>>> [c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c
>>>>>> [c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154
>>>>>> [c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4
>>>>>> [c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40
>>>>>> [c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228
>>>>>> [c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0
>>>>>> [c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc
>>>>>> [c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
>>>>>> [c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
>>>>>> [c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40
>>>>>> Instruction dump:
>>>>>> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
>>>>>> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
>>>>> I see no obvious lockup sites near the end of sym_hcb_attach(). Maybe it's
>>>>> being called lots of times from a higher level.. Do the traces all look
>>>>> the same?
>>>> Hi Andrew,
>>>>
>>>> I see this call trace twice and both looks similar and on another reboot
>>>> the following trace is seen twice in different cpu
>>>>
>>>> BUG: soft lockup detected on CPU#3!
>>>> Call Trace:
>>>> [C00000003FEDEDA0] [C000000000010220] .show_stack+0x68/0x1b0 (unreliable)
>>>> [C00000003FEDEE40] [C0000000000A061C] .softlockup_tick+0xf0/0x13c
>>>> [C00000003FEDEEF0] [C000000000072E2C] .run_local_timers+0x1c/0x30
>>>> [C00000003FEDEF70] [C000000000022FA0] .timer_interrupt+0xa8/0x488
>>>> [C00000003FEDF050] [C0000000000034EC] decrementer_common+0xec/0x100
>>>> --- Exception: 901 at .ioread8+0x14/0x60
>>>> LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
>>>> [C00000003FEDF340] [D0000000002B3BC0] 0xd0000000002b3bc0 (unreliable)
>>>> [C00000003FEDF3B0] [D00000000029A3C0] .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
>>>> [C00000003FEDF480] [D000000000291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
>>>> [C00000003FEDF710] [C0000000001B65A4] .pci_device_probe+0x13c/0x1dc
>>>> [C00000003FEDF7D0] [C000000000219A0C] .driver_probe_device+0xa0/0x15c
>>>> [C00000003FEDF870] [C000000000219C64] .__driver_attach+0xb4/0x138
>>>> [C00000003FEDF900] [C00000000021913C] .bus_for_each_dev+0x7c/0xd4
>>>> [C00000003FEDF9C0] [C0000000002198B0] .driver_attach+0x28/0x40
>>>> [C00000003FEDFA40] [C000000000218BA4] .bus_add_driver+0x98/0x18c
>>>> [C00000003FEDFAE0] [C00000000021A064] .driver_register+0xa8/0xc4
>>>> [C00000003FEDFB60] [C0000000001B68AC] .__pci_register_driver+0x5c/0xa4
>>>> [C00000003FEDFBF0] [D00000000029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
>>>> [C00000003FEDFC90] [C00000000008D1F4] .sys_init_module+0x1764/0x1998
>>>> [C00000003FEDFE30] [C00000000000869C] syscall_exit+0x0/0x40
>>>>
>>> hm, odd.
>>>
>>> Can you look up sym_hcb_attach+0x1194/0x1384 in gdb? Something like
>>>
>> Hi Andrew,
>>
>> I tried with 2.6.24-rc3-git3 and got the following trace
>>
>> BUG: soft lockup - CPU#2 stuck for 11s! [insmod:375]
>> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
>> REGS: c00000077ca3b0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git3-autokern1)
>> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000
>> TASK = c00000077cc58000[375] 'insmod' THREAD: c00000077ca38000 CPU: 2
>> GPR00: d0000000001414fc c00000077ca3b330 c00000000052b880 d000080080002014
>> GPR04: d00008008000202c 0000000000000000 c00000077c82eb00 d00000000014ce54
>> GPR08: c00000077c82e63c 0000000000000000 000000000000002a c00000000002f018
>> GPR12: d000000000143610 c000000000473f80
>> NIP [c00000000002f02c] .ioread8+0x14/0x60
>> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>>
>> Call Trace:
>> [c00000077ca3b330] [c00000077ca3b3c0] 0xc00000077ca3b3c0 (unreliable)
>> [c00000077ca3b3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>> [c00000077ca3b470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
>> [c00000077ca3b710] [c0000000001bc098] .pci_device_probe+0x124/0x1b0
>> [c00000077ca3b7b0] [c0000000002210c4] .driver_probe_device+0x144/0x20c
>> [c00000077ca3b850] [c0000000002213dc] .__driver_attach+0xcc/0x154
>> [c00000077ca3b8e0] [c00000000021ff20] .bus_for_each_dev+0x7c/0xd4
>> [c00000077ca3b9a0] [c000000000220e28] .driver_attach+0x28/0x40
>> [c00000077ca3ba20] [c000000000220464] .bus_add_driver+0x90/0x228
>> [c00000077ca3bac0] [c0000000002217e4] .driver_register+0x94/0xb0
>> [c00000077ca3bb40] [c0000000001bc3b0] .__pci_register_driver+0x6c/0xcc
>> [c00000077ca3bbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
>> [c00000077ca3bc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
>> [c00000077ca3be30] [c00000000000872c] syscall_exit+0x0/0x40
>>
>> Instruction dump:
>> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
>> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
>>
>> The gdb list the following for the above trace
>>
>> 0xa4fc is in sym_hcb_attach (drivers/scsi/sym53c8xx_2/sym_hipd.c:1041).
>> 1036 OUTL_DSP(np, pc);
>> 1037 /*
>> 1038 * Wait 'til done (with timeout)
>> 1039 */
>> 1040 for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
>> 1041 if (INB(np, nc_istat) & (INTF|SIP|DIP))
>> 1042 break;
>> 1043 if (i>=SYM_SNOOP_TIMEOUT) {
>> 1044 printf ("CACHE TEST FAILED: timeout.\n");
>> 1045 return (0x20);
>>
>
> doh, I missed that.
>
> #define SYM_SNOOP_TIMEOUT (10000000)
>
> ten million is close enough to infinity for me to assume that we broke the
> driver and that's never going to terminate.
>
> otoh, if that's true you should have got the "CACHE TEST FAILED: timeout"
> message. Did you? And does the driver actually work OK after this?
Yes, i got that message
CACHE TEST FAILED: timeout.
sym1: CACHE INCORRECTLY CONFIGURED.
sym1: giving up ...
and the file system stress passes.
> If it is indeed expected that a ~10 second stall in there is correct
> behaviour then all we need to do is do make that loop a bit smarter (10,000
> msleep(1)'s, for example).
>
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-29 8:35 ` Andrew Morton
2007-11-29 9:13 ` Kamalesh Babulal
@ 2007-11-30 6:39 ` Kyle McMartin
2007-11-30 7:00 ` Andrew Morton
1 sibling, 1 reply; 15+ messages in thread
From: Kyle McMartin @ 2007-11-30 6:39 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-scsi, Matthew Wilcox, LKML, Kamalesh Babulal,
Rafael J. Wysocki, linuxppc-dev, Balbir Singh
On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
> ten million is close enough to infinity for me to assume that we broke the
> driver and that's never going to terminate.
>
how about this? doesn't break things on my pa8800:
diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_hipd.c
index 463f119..ef01cb1 100644
--- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
+++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
@@ -1037,10 +1037,13 @@ restart_test:
/*
* Wait 'til done (with timeout)
*/
- for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
+ do {
if (INB(np, nc_istat) & (INTF|SIP|DIP))
break;
- if (i>=SYM_SNOOP_TIMEOUT) {
+ msleep(10);
+ } while (i++ < SYM_SNOOP_TIMEOUT);
+
+ if (i >= SYM_SNOOP_TIMEOUT) {
printf ("CACHE TEST FAILED: timeout.\n");
return (0x20);
}
diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h b/drivers/scsi/sym53c8xx_2/sym_hipd.h
index ad07880..85c483b 100644
--- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
+++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
@@ -339,7 +339,7 @@
/*
* Misc.
*/
-#define SYM_SNOOP_TIMEOUT (10000000)
+#define SYM_SNOOP_TIMEOUT (1000)
#define BUS_8_BIT 0
#define BUS_16_BIT 1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-30 6:39 ` Kyle McMartin
@ 2007-11-30 7:00 ` Andrew Morton
2007-11-30 7:02 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-11-30 7:00 UTC (permalink / raw)
To: Kyle McMartin
Cc: linux-scsi, Matthew, Wilcox, LKML, Kamalesh Babulal,
Rafael J. Wysocki, linuxppc-dev, Balbir Singh
On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <kyle@mcmartin.ca> wrote:
> On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
> > ten million is close enough to infinity for me to assume that we broke the
> > driver and that's never going to terminate.
> >
>
> how about this? doesn't break things on my pa8800:
>
> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> index 463f119..ef01cb1 100644
> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> @@ -1037,10 +1037,13 @@ restart_test:
> /*
> * Wait 'til done (with timeout)
> */
> - for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
> + do {
> if (INB(np, nc_istat) & (INTF|SIP|DIP))
> break;
> - if (i>=SYM_SNOOP_TIMEOUT) {
> + msleep(10);
> + } while (i++ < SYM_SNOOP_TIMEOUT);
> +
> + if (i >= SYM_SNOOP_TIMEOUT) {
> printf ("CACHE TEST FAILED: timeout.\n");
> return (0x20);
> }
> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> index ad07880..85c483b 100644
> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> @@ -339,7 +339,7 @@
> /*
> * Misc.
> */
> -#define SYM_SNOOP_TIMEOUT (10000000)
> +#define SYM_SNOOP_TIMEOUT (1000)
> #define BUS_8_BIT 0
> #define BUS_16_BIT 1
>
That might be the fix, but do we know what we're actually fixing? afaik
2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we
don't know why?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-30 7:00 ` Andrew Morton
@ 2007-11-30 7:02 ` Andrew Morton
2007-11-30 7:28 ` Kamalesh Babulal
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-11-30 7:02 UTC (permalink / raw)
To: Kyle McMartin, Kamalesh Babulal, LKML, linuxppc-dev,
Andy Whitcroft, Balbir Singh, linux-scsi, Rafael J. Wysocki,
Matthew Wilcox
On Thu, 29 Nov 2007 23:00:47 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <kyle@mcmartin.ca> wrote:
>
> > On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
> > > ten million is close enough to infinity for me to assume that we broke the
> > > driver and that's never going to terminate.
> > >
> >
> > how about this? doesn't break things on my pa8800:
> >
> > diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> > index 463f119..ef01cb1 100644
> > --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
> > +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> > @@ -1037,10 +1037,13 @@ restart_test:
> > /*
> > * Wait 'til done (with timeout)
> > */
> > - for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
> > + do {
> > if (INB(np, nc_istat) & (INTF|SIP|DIP))
> > break;
> > - if (i>=SYM_SNOOP_TIMEOUT) {
> > + msleep(10);
> > + } while (i++ < SYM_SNOOP_TIMEOUT);
> > +
> > + if (i >= SYM_SNOOP_TIMEOUT) {
> > printf ("CACHE TEST FAILED: timeout.\n");
> > return (0x20);
> > }
> > diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> > index ad07880..85c483b 100644
> > --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
> > +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> > @@ -339,7 +339,7 @@
> > /*
> > * Misc.
> > */
> > -#define SYM_SNOOP_TIMEOUT (10000000)
> > +#define SYM_SNOOP_TIMEOUT (1000)
> > #define BUS_8_BIT 0
> > #define BUS_16_BIT 1
> >
>
> That might be the fix, but do we know what we're actually fixing? afaik
> 2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we
> don't know why?
>
<looks at Subject:>
<Checks that Rafael was cc'ed>
So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-30 7:02 ` Andrew Morton
@ 2007-11-30 7:28 ` Kamalesh Babulal
2007-12-03 21:11 ` Andrew Morton
2007-12-04 12:20 ` Ingo Molnar
0 siblings, 2 replies; 15+ messages in thread
From: Kamalesh Babulal @ 2007-11-30 7:28 UTC (permalink / raw)
To: Andrew Morton
Cc: Rafael J. Wysocki, linux-scsi, Matthew Wilcox, LKML,
Kyle McMartin, linuxppc-dev, Balbir Singh
Andrew Morton wrote:
> On Thu, 29 Nov 2007 23:00:47 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <kyle@mcmartin.ca> wrote:
>>
>>> On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
>>>> ten million is close enough to infinity for me to assume that we broke the
>>>> driver and that's never going to terminate.
>>>>
>>> how about this? doesn't break things on my pa8800:
>>>
>>> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_hipd.c
>>> index 463f119..ef01cb1 100644
>>> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
>>> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
>>> @@ -1037,10 +1037,13 @@ restart_test:
>>> /*
>>> * Wait 'til done (with timeout)
>>> */
>>> - for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
>>> + do {
>>> if (INB(np, nc_istat) & (INTF|SIP|DIP))
>>> break;
>>> - if (i>=SYM_SNOOP_TIMEOUT) {
>>> + msleep(10);
>>> + } while (i++ < SYM_SNOOP_TIMEOUT);
>>> +
>>> + if (i >= SYM_SNOOP_TIMEOUT) {
>>> printf ("CACHE TEST FAILED: timeout.\n");
>>> return (0x20);
>>> }
>>> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h b/drivers/scsi/sym53c8xx_2/sym_hipd.h
>>> index ad07880..85c483b 100644
>>> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
>>> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
>>> @@ -339,7 +339,7 @@
>>> /*
>>> * Misc.
>>> */
>>> -#define SYM_SNOOP_TIMEOUT (10000000)
>>> +#define SYM_SNOOP_TIMEOUT (1000)
>>> #define BUS_8_BIT 0
>>> #define BUS_16_BIT 1
>>>
>> That might be the fix, but do we know what we're actually fixing? afaik
>> 2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we
>> don't know why?
>>
>
> <looks at Subject:>
>
> <Checks that Rafael was cc'ed>
>
> So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not?
Yes, the 2.6.24-rc3 was Ok and this is seen from 2.6.24-rc3-git2/3/4.
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-30 7:28 ` Kamalesh Babulal
@ 2007-12-03 21:11 ` Andrew Morton
2007-12-04 12:20 ` Ingo Molnar
1 sibling, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2007-12-03 21:11 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: rjw, linux-scsi, willy, linux-kernel, kyle, linuxppc-dev, balbir
On Fri, 30 Nov 2007 12:58:06 +0530
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> Andrew Morton wrote:
> > On Thu, 29 Nov 2007 23:00:47 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> >> On Fri, 30 Nov 2007 01:39:29 -0500 Kyle McMartin <kyle@mcmartin.ca> wrote:
> >>
> >>> On Thu, Nov 29, 2007 at 12:35:33AM -0800, Andrew Morton wrote:
> >>>> ten million is close enough to infinity for me to assume that we broke the
> >>>> driver and that's never going to terminate.
> >>>>
> >>> how about this? doesn't break things on my pa8800:
> >>>
> >>> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> >>> index 463f119..ef01cb1 100644
> >>> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
> >>> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
> >>> @@ -1037,10 +1037,13 @@ restart_test:
> >>> /*
> >>> * Wait 'til done (with timeout)
> >>> */
> >>> - for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
> >>> + do {
> >>> if (INB(np, nc_istat) & (INTF|SIP|DIP))
> >>> break;
> >>> - if (i>=SYM_SNOOP_TIMEOUT) {
> >>> + msleep(10);
> >>> + } while (i++ < SYM_SNOOP_TIMEOUT);
> >>> +
> >>> + if (i >= SYM_SNOOP_TIMEOUT) {
> >>> printf ("CACHE TEST FAILED: timeout.\n");
> >>> return (0x20);
> >>> }
> >>> diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> >>> index ad07880..85c483b 100644
> >>> --- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
> >>> +++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
> >>> @@ -339,7 +339,7 @@
> >>> /*
> >>> * Misc.
> >>> */
> >>> -#define SYM_SNOOP_TIMEOUT (10000000)
> >>> +#define SYM_SNOOP_TIMEOUT (1000)
> >>> #define BUS_8_BIT 0
> >>> #define BUS_16_BIT 1
> >>>
> >> That might be the fix, but do we know what we're actually fixing? afaik
> >> 2.6.24-rc3 doesn't get this timeout, 2.6.24-rc3-mm2 does get it and we
> >> don't know why?
> >>
> >
> > <looks at Subject:>
> >
> > <Checks that Rafael was cc'ed>
> >
> > So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not?
>
> Yes, the 2.6.24-rc3 was Ok and this is seen from 2.6.24-rc3-git2/3/4.
>
There are effectively no drivers/scsi/ changes after 2.6.24-rc3 and we
don't (I believe) have a clue what caused this regression.
Can you please do a bisection search on this?
Thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-11-30 7:28 ` Kamalesh Babulal
2007-12-03 21:11 ` Andrew Morton
@ 2007-12-04 12:20 ` Ingo Molnar
2007-12-04 14:57 ` Kamalesh Babulal
1 sibling, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2007-12-04 12:20 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Rafael J. Wysocki, linux-scsi, Matthew Wilcox, LKML,
Kyle McMartin, linuxppc-dev, Andrew Morton, Balbir Singh
* Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> > So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not?
>
> Yes, the 2.6.24-rc3 was Ok and this is seen from 2.6.24-rc3-git2/3/4.
just to make sure: this is a real lockup and failed bootup (or device
init), not just a message, right?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-12-04 12:20 ` Ingo Molnar
@ 2007-12-04 14:57 ` Kamalesh Babulal
2007-12-04 20:55 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Kamalesh Babulal @ 2007-12-04 14:57 UTC (permalink / raw)
To: Ingo Molnar
Cc: Rafael J. Wysocki, linux-scsi, Matthew Wilcox, LKML,
Kyle McMartin, linuxppc-dev, Andrew Morton, Balbir Singh
Ingo Molnar wrote:
> * Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>
>>> So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not?
>> Yes, the 2.6.24-rc3 was Ok and this is seen from 2.6.24-rc3-git2/3/4.
>
> just to make sure: this is a real lockup and failed bootup (or device
> init), not just a message, right?
>
> Ingo
> --
Hi Ingo,
This softlockup is seen in the 2.6.24-rc4 either and looks like a message because
this is seen while running tbench and machine continues running other test's after
the softlockup messages and some times seen with the bootup, but the machines reaches the
login prompt and able to continue running tests.
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG] 2.6.24-rc3-git2 softlockup detected
2007-12-04 14:57 ` Kamalesh Babulal
@ 2007-12-04 20:55 ` Ingo Molnar
0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2007-12-04 20:55 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Rafael J. Wysocki, linux-scsi, Matthew Wilcox, LKML,
Kyle McMartin, linuxppc-dev, Andrew Morton, Balbir Singh
* Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> Hi Ingo,
>
> This softlockup is seen in the 2.6.24-rc4 either and looks like a
> message because this is seen while running tbench and machine
> continues running other test's after the softlockup messages and some
> times seen with the bootup, but the machines reaches the login prompt
> and able to continue running tests.
do you know whether there's any true delay when this happens, or is it a
pure softlockup-detector false positive?
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-12-04 20:55 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-28 6:29 [BUG] 2.6.24-rc3-git2 softlockup detected Kamalesh Babulal
2007-11-28 6:53 ` Andrew Morton
2007-11-28 7:17 ` Kamalesh Babulal
2007-11-28 7:25 ` Andrew Morton
2007-11-29 6:31 ` Kamalesh Babulal
2007-11-29 8:35 ` Andrew Morton
2007-11-29 9:13 ` Kamalesh Babulal
2007-11-30 6:39 ` Kyle McMartin
2007-11-30 7:00 ` Andrew Morton
2007-11-30 7:02 ` Andrew Morton
2007-11-30 7:28 ` Kamalesh Babulal
2007-12-03 21:11 ` Andrew Morton
2007-12-04 12:20 ` Ingo Molnar
2007-12-04 14:57 ` Kamalesh Babulal
2007-12-04 20:55 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).