* lockdep badness
@ 2008-07-24 19:23 Nathan Lynch
2008-07-24 19:38 ` Nathan Lynch
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Nathan Lynch @ 2008-07-24 19:23 UTC (permalink / raw)
To: linuxppc-dev
I'm seeing warnings from the lockdep code itself in recent kernels on
a Power6 blade (v2.6.26 and benh's -next branch).
Something to do with powerpc's "lazy" interrupt-disabling, perhaps?
A couple of stack traces below, the first is from benh's tree, the
second is from 2.6.26. The lockdep self-tests all pass at boot.
RTAS daemon started
RTAS: event: 295, Type: Dump Notification Event, Severity: 2
------------[ cut here ]------------
Badness at kernel/lockdep.c:2719
NIP: c0000000000b06dc LR: c0000000000b06c0 CTR: 0000000000000000
REGS: c0000000e787b9a0 TRAP: 0700 Not tainted (2.6.26)
MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000042 XER: 20000000
TASK = c0000000e91a6100[428] 'rtasd' THREAD: c0000000e7878000 CPU: 1
GPR00: 0000000000000000 c0000000e787bc20 c000000000a71270 0000000000000001
GPR04: 0000000000000001 c00000000004d648 0000040100000000 0000040100000000
GPR08: 0000000000000000 c000000000f1f858 0000000000000020 0000000000000001
GPR12: 0000000000010000 c000000000aac600 00000000002146f4 0000000000214744
GPR16: 4000000002040000 00000000029015b0 00000000002a3c00 c0000000008cc988
GPR20: 0000000002901598 0000000000000000 0000000000000124 0000000000000002
GPR24: 0000000000000001 0000000000000008 0000000000000001 c0000000009fdf00
GPR28: c00000000004d648 c000000000943e98 c0000000009d9b40 c0000000e787bc20
NIP [c0000000000b06dc] .check_flags+0x9c/0x174
LR [c0000000000b06c0] .check_flags+0x80/0x174
Call Trace:
[c0000000e787bc20] [c0000000e787bc70] 0xc0000000e787bc70 (unreliable)
[c0000000e787bca0] [c0000000000b5ac8] .lock_release+0x7c/0x208
[c0000000e787bd50] [c0000000005e12c0] ._spin_unlock_irqrestore+0x34/0x94
[c0000000e787bde0] [c00000000004d648] .pSeries_log_error+0x380/0x3f0
[c0000000e787bef0] [c00000000004d8e4] .rtasd+0x98/0x100
[c0000000e787bf90] [c000000000029d20] .kernel_thread+0x4c/0x68
Instruction dump:
e92d01b0 80090904 2f800000 40be002c 481ece4d 60000000 2fa30000 419e00c4
e93e80c0 80090000 2f800000 409e00b4 <0fe00000> 480000ac 78290464 80090014
possible reason: unannotated irqs-on.
irq event stamp: 20
hardirqs last enabled at (19): [<c0000000000b405c>] .trace_hardirqs_on+0x1c/0x3
0
hardirqs last disabled at (20): [<c0000000000b0d80>] .trace_hardirqs_off+0x1c/0x
30
softirqs last enabled at (0): [<c00000000008293c>] .copy_process+0x3c8/0x1040
softirqs last disabled at (0): [<0000000000000000>] 0x0
===============
ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
ipr 0000:00:01.0: Found IOA with IRQ: 289
ipr 0000:00:01.0: Starting IOA initialization sequence.
ipr 0000:00:01.0: Adapter firmware version: 02200023
ipr 0000:00:01.0: IOA initialized.
------------[ cut here ]------------
Badness at kernel/lockdep.c:2037
NIP: c0000000000b29fc LR: c0000000000b29e0 CTR: 0000000000000000
REGS: c00000000fffb890 TRAP: 0700 Not tainted (2.6.26)
MSR: 8000000000021032 <ME,IR,DR> CR: 28000022 XER: 00000004
TASK = c000000000996730[0] 'swapper' THREAD: c000000000a60000 CPU: 0
GPR00: 0000000000000000 c00000000fffbb10 c000000000a61ad8 0000000000000001
GPR04: 0000000000035112 c000000000426708 0000000000000000 0000000000000970
GPR08: 0000000000000000 c000000000ec6ad0 0000000000000001 0000000000000001
GPR12: 0000000000043bd8 c000000000a9d400 00000000002146f4 0000000000214744
GPR16: 4000000002040000 00000000028f37a8 00000000002a3c00 c0000000008b3790
GPR20: 00000000028f3790 0000000001a3f920 c0000000008b37a8 c0000000e6252000
GPR24: c0000000e6454a48 0000000000100100 0000000000200200 c0000000005d8824
GPR28: c000000000afba18 c000000000996730 c0000000009cb7f0 c00000000fffbb10
NIP [c0000000000b29fc] .trace_hardirqs_on+0x120/0x1b0
LR [c0000000000b29e0] .trace_hardirqs_on+0x104/0x1b0
Call Trace:
[c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
[c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
[c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
[c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
[c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
[c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
[c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
[c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
[c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
[c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
--- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
LR = .cpu_idle+0x140/0x210
[c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
[c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
[c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
Instruction dump:
e92d01b0 80090954 2f800000 41be002c 481e61fd 60000000 2fa30000 419e0080
e93e80c0 80090000 2f800000 409e0070 <0fe00000> 48000068 7fa3eb78 38800001
scsi0 : IBM 572C Storage Adapter
scsi 0:3:0:0: Direct-Access IBM-ESXS ST973402SS B522 PQ: 0 ANSI: 5
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep badness
2008-07-24 19:23 lockdep badness Nathan Lynch
@ 2008-07-24 19:38 ` Nathan Lynch
2008-07-24 22:47 ` Benjamin Herrenschmidt
2008-07-24 22:44 ` Benjamin Herrenschmidt
2008-07-25 12:59 ` Sebastien Dugue
2 siblings, 1 reply; 7+ messages in thread
From: Nathan Lynch @ 2008-07-24 19:38 UTC (permalink / raw)
To: linuxppc-dev
Nathan Lynch wrote:
>
> A couple of stack traces below, the first is from benh's tree, the
> second is from 2.6.26. The lockdep self-tests all pass at boot.
Sorry, should have pointed out the code that is warning more
specifically.
> RTAS daemon started
> RTAS: event: 295, Type: Dump Notification Event, Severity: 2
> ------------[ cut here ]------------
> Badness at kernel/lockdep.c:2719
check_flags():
if (irqs_disabled_flags(flags)) {
if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)) {
printk("possible reason: unannotated
irqs-off.\n");
}
} else {
>>> if (DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled)) {
printk("possible reason: unannotated
irqs-on.\n");
}
}
> NIP: c0000000000b06dc LR: c0000000000b06c0 CTR: 0000000000000000
> REGS: c0000000e787b9a0 TRAP: 0700 Not tainted (2.6.26)
> MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000042 XER: 20000000
> TASK = c0000000e91a6100[428] 'rtasd' THREAD: c0000000e7878000 CPU: 1
> GPR00: 0000000000000000 c0000000e787bc20 c000000000a71270 0000000000000001
> GPR04: 0000000000000001 c00000000004d648 0000040100000000 0000040100000000
> GPR08: 0000000000000000 c000000000f1f858 0000000000000020 0000000000000001
> GPR12: 0000000000010000 c000000000aac600 00000000002146f4 0000000000214744
> GPR16: 4000000002040000 00000000029015b0 00000000002a3c00 c0000000008cc988
> GPR20: 0000000002901598 0000000000000000 0000000000000124 0000000000000002
> GPR24: 0000000000000001 0000000000000008 0000000000000001 c0000000009fdf00
> GPR28: c00000000004d648 c000000000943e98 c0000000009d9b40 c0000000e787bc20
> NIP [c0000000000b06dc] .check_flags+0x9c/0x174
> LR [c0000000000b06c0] .check_flags+0x80/0x174
> Call Trace:
> [c0000000e787bc20] [c0000000e787bc70] 0xc0000000e787bc70 (unreliable)
> [c0000000e787bca0] [c0000000000b5ac8] .lock_release+0x7c/0x208
> [c0000000e787bd50] [c0000000005e12c0] ._spin_unlock_irqrestore+0x34/0x94
> [c0000000e787bde0] [c00000000004d648] .pSeries_log_error+0x380/0x3f0
> [c0000000e787bef0] [c00000000004d8e4] .rtasd+0x98/0x100
> [c0000000e787bf90] [c000000000029d20] .kernel_thread+0x4c/0x68
> Instruction dump:
> e92d01b0 80090904 2f800000 40be002c 481ece4d 60000000 2fa30000 419e00c4
> e93e80c0 80090000 2f800000 409e00b4 <0fe00000> 480000ac 78290464 80090014
> possible reason: unannotated irqs-on.
> irq event stamp: 20
> hardirqs last enabled at (19): [<c0000000000b405c>] .trace_hardirqs_on+0x1c/0x3
> 0
> hardirqs last disabled at (20): [<c0000000000b0d80>] .trace_hardirqs_off+0x1c/0x
> 30
> softirqs last enabled at (0): [<c00000000008293c>] .copy_process+0x3c8/0x1040
> softirqs last disabled at (0): [<0000000000000000>] 0x0
>
> ===============
>
> ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
> ipr 0000:00:01.0: Found IOA with IRQ: 289
> ipr 0000:00:01.0: Starting IOA initialization sequence.
> ipr 0000:00:01.0: Adapter firmware version: 02200023
> ipr 0000:00:01.0: IOA initialized.
> ------------[ cut here ]------------
> Badness at kernel/lockdep.c:2037
trace_hardirqs_on():
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return;
>>> if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
return;
> NIP: c0000000000b29fc LR: c0000000000b29e0 CTR: 0000000000000000
> REGS: c00000000fffb890 TRAP: 0700 Not tainted (2.6.26)
> MSR: 8000000000021032 <ME,IR,DR> CR: 28000022 XER: 00000004
> TASK = c000000000996730[0] 'swapper' THREAD: c000000000a60000 CPU: 0
> GPR00: 0000000000000000 c00000000fffbb10 c000000000a61ad8 0000000000000001
> GPR04: 0000000000035112 c000000000426708 0000000000000000 0000000000000970
> GPR08: 0000000000000000 c000000000ec6ad0 0000000000000001 0000000000000001
> GPR12: 0000000000043bd8 c000000000a9d400 00000000002146f4 0000000000214744
> GPR16: 4000000002040000 00000000028f37a8 00000000002a3c00 c0000000008b3790
> GPR20: 00000000028f3790 0000000001a3f920 c0000000008b37a8 c0000000e6252000
> GPR24: c0000000e6454a48 0000000000100100 0000000000200200 c0000000005d8824
> GPR28: c000000000afba18 c000000000996730 c0000000009cb7f0 c00000000fffbb10
> NIP [c0000000000b29fc] .trace_hardirqs_on+0x120/0x1b0
> LR [c0000000000b29e0] .trace_hardirqs_on+0x104/0x1b0
> Call Trace:
> [c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
> [c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
> [c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
> [c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
> [c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
> [c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
> [c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
> [c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
> [c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
> [c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
> --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
> LR = .cpu_idle+0x140/0x210
> [c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
> [c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
> [c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
> Instruction dump:
> e92d01b0 80090954 2f800000 41be002c 481e61fd 60000000 2fa30000 419e0080
> e93e80c0 80090000 2f800000 409e0070 <0fe00000> 48000068 7fa3eb78 38800001
> scsi0 : IBM 572C Storage Adapter
> scsi 0:3:0:0: Direct-Access IBM-ESXS ST973402SS B522 PQ: 0 ANSI: 5
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep badness
2008-07-24 19:23 lockdep badness Nathan Lynch
2008-07-24 19:38 ` Nathan Lynch
@ 2008-07-24 22:44 ` Benjamin Herrenschmidt
2008-07-24 23:00 ` Nathan Lynch
2008-07-24 23:04 ` Arnd Bergmann
2008-07-25 12:59 ` Sebastien Dugue
2 siblings, 2 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-24 22:44 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linuxppc-dev
On Thu, 2008-07-24 at 14:23 -0500, Nathan Lynch wrote:
> I'm seeing warnings from the lockdep code itself in recent kernels on
> a Power6 blade (v2.6.26 and benh's -next branch).
>
> Something to do with powerpc's "lazy" interrupt-disabling, perhaps?
>
> A couple of stack traces below, the first is from benh's tree, the
> second is from 2.6.26. The lockdep self-tests all pass at boot.
Interesting.
> [c0000000e787bc20] [c0000000e787bc70] 0xc0000000e787bc70 (unreliable)
> [c0000000e787bca0] [c0000000000b5ac8] .lock_release+0x7c/0x208
> [c0000000e787bd50] [c0000000005e12c0] ._spin_unlock_irqrestore+0x34/0x94
> [c0000000e787bde0] [c00000000004d648] .pSeries_log_error+0x380/0x3f0
> [c0000000e787bef0] [c00000000004d8e4] .rtasd+0x98/0x100
> [c0000000e787bf90] [c000000000029d20] .kernel_thread+0x4c/0x68
> Instruction dump:
This one is one I haven't managed to reproduce and didn't quite find out
what could be causing it, but it was already reported by Badari (and in
fact is referenced as a regression in Rafael list).
> Call Trace:
> [c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
> [c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
> [c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
> [c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
> [c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
> [c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
> [c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
> [c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
> [c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
> [c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
> --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
> LR = .cpu_idle+0x140/0x210
> [c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
> [c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
> [c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
> Instruction dump:
This one is new to me. I will have a look. What machine is this ?
I suspect the error is to do spin_lock/unlock_irq rather than
save/restore variants at IRQ time, which would be an IPR bug... or
rather something legal that Ingo decided shouldn't be anymore :-)
Cheers,
Ben.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep badness
2008-07-24 19:38 ` Nathan Lynch
@ 2008-07-24 22:47 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-24 22:47 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linuxppc-dev
On Thu, 2008-07-24 at 14:38 -0500, Nathan Lynch wrote:
> Nathan Lynch wrote:
> >
> > A couple of stack traces below, the first is from benh's tree, the
> > second is from 2.6.26. The lockdep self-tests all pass at boot.
>
> Sorry, should have pointed out the code that is warning more
> specifically.
>
>
> > RTAS daemon started
> > RTAS: event: 295, Type: Dump Notification Event, Severity: 2
> > ------------[ cut here ]------------
> > Badness at kernel/lockdep.c:2719
>
> check_flags():
>
> if (irqs_disabled_flags(flags)) {
> if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)) {
> printk("possible reason: unannotated
> irqs-off.\n");
> }
> } else {
> >>> if (DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled)) {
> printk("possible reason: unannotated
> irqs-on.\n");
> }
> }
Yes, it looks like at some point the IRQ state gets out of sync, and I
haven't found out where yet. Having a repro case would help, what is the
setup of your machine ? I seems to be related to having an RTAS event at
boot.
> > ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
> > ipr 0000:00:01.0: Found IOA with IRQ: 289
> > ipr 0000:00:01.0: Starting IOA initialization sequence.
> > ipr 0000:00:01.0: Adapter firmware version: 02200023
> > ipr 0000:00:01.0: IOA initialized.
> > ------------[ cut here ]------------
> > Badness at kernel/lockdep.c:2037
>
> trace_hardirqs_on():
>
> if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
> return;
> >>> if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
> return;
As I said before, I'm not 100% sure what that is, it smells like
something legal that Ingo made not so anymore but I need to double
check.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep badness
2008-07-24 22:44 ` Benjamin Herrenschmidt
@ 2008-07-24 23:00 ` Nathan Lynch
2008-07-24 23:04 ` Arnd Bergmann
1 sibling, 0 replies; 7+ messages in thread
From: Nathan Lynch @ 2008-07-24 23:00 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Benjamin Herrenschmidt wrote:
> On Thu, 2008-07-24 at 14:23 -0500, Nathan Lynch wrote:
> > I'm seeing warnings from the lockdep code itself in recent kernels on
> > a Power6 blade (v2.6.26 and benh's -next branch).
> >
> > Something to do with powerpc's "lazy" interrupt-disabling, perhaps?
> >
> > A couple of stack traces below, the first is from benh's tree, the
> > second is from 2.6.26. The lockdep self-tests all pass at boot.
>
> Interesting.
>
> > [c0000000e787bc20] [c0000000e787bc70] 0xc0000000e787bc70 (unreliable)
> > [c0000000e787bca0] [c0000000000b5ac8] .lock_release+0x7c/0x208
> > [c0000000e787bd50] [c0000000005e12c0] ._spin_unlock_irqrestore+0x34/0x94
> > [c0000000e787bde0] [c00000000004d648] .pSeries_log_error+0x380/0x3f0
> > [c0000000e787bef0] [c00000000004d8e4] .rtasd+0x98/0x100
> > [c0000000e787bf90] [c000000000029d20] .kernel_thread+0x4c/0x68
> > Instruction dump:
>
> This one is one I haven't managed to reproduce and didn't quite find out
> what could be causing it, but it was already reported by Badari (and in
> fact is referenced as a regression in Rafael list).
Okay.
> > Call Trace:
> > [c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
> > [c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
> > [c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
> > [c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
> > [c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
> > [c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
> > [c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
> > [c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
> > [c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
> > [c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
> > --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
> > LR = .cpu_idle+0x140/0x210
> > [c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
> > [c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
> > [c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
> > Instruction dump:
>
> This one is new to me. I will have a look. What machine is this ?
Power6 blade - JS22 (four cores), with single disk attached via IPR,
HEA for network... nothing exotic, I guess. Not sure how recreatable
the ipr trace is, I've only seen it once (and with 2.6.26 only). The
rtasd trace is pretty consistent on powerpc/next.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep badness
2008-07-24 22:44 ` Benjamin Herrenschmidt
2008-07-24 23:00 ` Nathan Lynch
@ 2008-07-24 23:04 ` Arnd Bergmann
1 sibling, 0 replies; 7+ messages in thread
From: Arnd Bergmann @ 2008-07-24 23:04 UTC (permalink / raw)
To: linuxppc-dev, benh; +Cc: Brian King, Nathan Lynch
On Friday 25 July 2008, Benjamin Herrenschmidt wrote:
> > Call Trace:
> > [c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
> > [c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
> > [c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
> > [c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
> > [c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
> > [c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
> > [c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
> > [c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
> > [c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
> > [c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
> > --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
> > =A0 =A0 LR =3D .cpu_idle+0x140/0x210
> > [c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
> > [c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
> > [c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
> > Instruction dump:
>=20
> This one is new to me. I will have a look. What machine is this ?
>=20
> I suspect the error is to do spin_lock/unlock_irq rather than
> save/restore variants at IRQ time, which would be an IPR bug... or
> rather something legal that Ingo decided shouldn't be anymore :-)
Almost: The ipr_ioa_reset_done function does
spin_unlock_irq(ioa_cfg->host->host_lock);
scsi_unblock_requests(ioa_cfg->host);
spin_lock_irq(ioa_cfg->host->host_lock);
It seem that it is always called with interrupts disabled and
ioa_cfg->host->host_lock held (otherwise we would get a different
warning), so to get rid of the lockdep warning, it should be
replaced with
spin_unlock(ioa_cfg->host->host_lock);
scsi_unblock_requests(ioa_cfg->host);
spin_lock(ioa_cfg->host->host_lock);
I.e. leave the interrupts off, but still give up the lock. It
still feels wrong to do this, but I don't understand much about
the driver either.
Arnd <><
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep badness
2008-07-24 19:23 lockdep badness Nathan Lynch
2008-07-24 19:38 ` Nathan Lynch
2008-07-24 22:44 ` Benjamin Herrenschmidt
@ 2008-07-25 12:59 ` Sebastien Dugue
2 siblings, 0 replies; 7+ messages in thread
From: Sebastien Dugue @ 2008-07-25 12:59 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linuxppc-dev, Arnd Bergmann
Hi,
On Thu, 24 Jul 2008 14:23:00 -0500 Nathan Lynch <ntl@pobox.com> wrote:
> I'm seeing warnings from the lockdep code itself in recent kernels on
> a Power6 blade (v2.6.26 and benh's -next branch).
>
> Something to do with powerpc's "lazy" interrupt-disabling, perhaps?
>
> A couple of stack traces below, the first is from benh's tree, the
> second is from 2.6.26. The lockdep self-tests all pass at boot.
>
> RTAS daemon started
> RTAS: event: 295, Type: Dump Notification Event, Severity: 2
> ------------[ cut here ]------------
> Badness at kernel/lockdep.c:2719
<snip>
I'm observing nearly the same thing (minus the ipr related badness) here with
Ben's powerpc.git (master) on a JS22 blade:
Freeing initrd memory: 2676k freed
RTAS daemon started
RTAS: event: 94, Type: Platform Information Event, Severity: 1
------------[ cut here ]------------
Badness at kernel/lockdep.c:2079
NIP: c0000000000846dc LR: c0000000000846c0 CTR: 0000000000000000
REGS: c0000001e353f9c0 TRAP: 0700 Not tainted (2.6.26)
MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000042 XER: 20000010
TASK = c0000000efa58640[115] 'rtasd' THREAD: c0000001e353c000 CPU: 3
GPR00: 0000000000000000 c0000001e353fc40 c0000000004efe50 0000000000000001
GPR04: 0000000000000001 c00000000003cc94 0000000000000001 636820202d203230
GPR08: 3465303000000000 c0000000009627a8 c0000000efa58e88 0000000000000001
GPR12: 3465303038200d0a c000000000525900 c000000000395ba0 4000000000c00000
GPR16: c0000000003943c8 0000000000000000 000000000029d000 000000000101c920
GPR20: 0000000000000000 0000000000000001 0000000000000744 0000000000000001
GPR24: c000000000538cf0 c0000001e3e74000 c00000000041b48c c0000000004c0060
GPR28: c0000000000847a4 c0000000efa58640 c0000000004b18d0 c0000001e353fc40
NIP [c0000000000846dc] .trace_hardirqs_on_caller+0xc4/0x170
LR [c0000000000846c0] .trace_hardirqs_on_caller+0xa8/0x170
Call Trace:
[c0000001e353fcd0] [c0000000000847a4] .trace_hardirqs_on+0x1c/0x30
[c0000001e353fd50] [c0000000002df554] ._spin_unlock_irqrestore+0x68/0xb8
[c0000001e353fde0] [c00000000003cc94] .pSeries_log_error+0x3a4/0x400
[c0000001e353fef0] [c00000000003cf28] .rtasd+0x98/0x100
[c0000001e353ff90] [c0000000000295f8] .kernel_thread+0x4c/0x68
Instruction dump:
901d07ec 880d01ca 2fa00000 41be002c 481300a1 60000000 2fa30000 419e00a0
e93e80f8 80090000 2f800000 409e0090 <0fe00000> 48000088 e92d01a0 8009082c
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
msgmni has been set to 15071
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
...
Sebastien.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-07-25 12:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-24 19:23 lockdep badness Nathan Lynch
2008-07-24 19:38 ` Nathan Lynch
2008-07-24 22:47 ` Benjamin Herrenschmidt
2008-07-24 22:44 ` Benjamin Herrenschmidt
2008-07-24 23:00 ` Nathan Lynch
2008-07-24 23:04 ` Arnd Bergmann
2008-07-25 12:59 ` Sebastien Dugue
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).