linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* lockdep badness
@ 2008-07-24 19:23 Nathan Lynch
  2008-07-24 19:38 ` Nathan Lynch
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Nathan Lynch @ 2008-07-24 19:23 UTC (permalink / raw)
  To: linuxppc-dev

I'm seeing warnings from the lockdep code itself in recent kernels on
a Power6 blade (v2.6.26 and benh's -next branch).

Something to do with powerpc's "lazy" interrupt-disabling, perhaps?

A couple of stack traces below, the first is from benh's tree, the
second is from 2.6.26.  The lockdep self-tests all pass at boot.

RTAS daemon started
RTAS: event: 295, Type: Dump Notification Event, Severity: 2
------------[ cut here ]------------
Badness at kernel/lockdep.c:2719
NIP: c0000000000b06dc LR: c0000000000b06c0 CTR: 0000000000000000
REGS: c0000000e787b9a0 TRAP: 0700   Not tainted  (2.6.26)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000042  XER: 20000000
TASK = c0000000e91a6100[428] 'rtasd' THREAD: c0000000e7878000 CPU: 1
GPR00: 0000000000000000 c0000000e787bc20 c000000000a71270 0000000000000001 
GPR04: 0000000000000001 c00000000004d648 0000040100000000 0000040100000000 
GPR08: 0000000000000000 c000000000f1f858 0000000000000020 0000000000000001 
GPR12: 0000000000010000 c000000000aac600 00000000002146f4 0000000000214744 
GPR16: 4000000002040000 00000000029015b0 00000000002a3c00 c0000000008cc988 
GPR20: 0000000002901598 0000000000000000 0000000000000124 0000000000000002 
GPR24: 0000000000000001 0000000000000008 0000000000000001 c0000000009fdf00 
GPR28: c00000000004d648 c000000000943e98 c0000000009d9b40 c0000000e787bc20 
NIP [c0000000000b06dc] .check_flags+0x9c/0x174
LR [c0000000000b06c0] .check_flags+0x80/0x174
Call Trace:
[c0000000e787bc20] [c0000000e787bc70] 0xc0000000e787bc70 (unreliable)
[c0000000e787bca0] [c0000000000b5ac8] .lock_release+0x7c/0x208
[c0000000e787bd50] [c0000000005e12c0] ._spin_unlock_irqrestore+0x34/0x94
[c0000000e787bde0] [c00000000004d648] .pSeries_log_error+0x380/0x3f0
[c0000000e787bef0] [c00000000004d8e4] .rtasd+0x98/0x100
[c0000000e787bf90] [c000000000029d20] .kernel_thread+0x4c/0x68
Instruction dump:
e92d01b0 80090904 2f800000 40be002c 481ece4d 60000000 2fa30000 419e00c4 
e93e80c0 80090000 2f800000 409e00b4 <0fe00000> 480000ac 78290464 80090014 
possible reason: unannotated irqs-on.
irq event stamp: 20
hardirqs last  enabled at (19): [<c0000000000b405c>] .trace_hardirqs_on+0x1c/0x3
0
hardirqs last disabled at (20): [<c0000000000b0d80>] .trace_hardirqs_off+0x1c/0x
30
softirqs last  enabled at (0): [<c00000000008293c>] .copy_process+0x3c8/0x1040
softirqs last disabled at (0): [<0000000000000000>] 0x0

===============

ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
ipr 0000:00:01.0: Found IOA with IRQ: 289
ipr 0000:00:01.0: Starting IOA initialization sequence.
ipr 0000:00:01.0: Adapter firmware version: 02200023
ipr 0000:00:01.0: IOA initialized.
------------[ cut here ]------------
Badness at kernel/lockdep.c:2037
NIP: c0000000000b29fc LR: c0000000000b29e0 CTR: 0000000000000000
REGS: c00000000fffb890 TRAP: 0700   Not tainted  (2.6.26)
MSR: 8000000000021032 <ME,IR,DR>  CR: 28000022  XER: 00000004
TASK = c000000000996730[0] 'swapper' THREAD: c000000000a60000 CPU: 0
GPR00: 0000000000000000 c00000000fffbb10 c000000000a61ad8 0000000000000001 
GPR04: 0000000000035112 c000000000426708 0000000000000000 0000000000000970 
GPR08: 0000000000000000 c000000000ec6ad0 0000000000000001 0000000000000001 
GPR12: 0000000000043bd8 c000000000a9d400 00000000002146f4 0000000000214744 
GPR16: 4000000002040000 00000000028f37a8 00000000002a3c00 c0000000008b3790 
GPR20: 00000000028f3790 0000000001a3f920 c0000000008b37a8 c0000000e6252000 
GPR24: c0000000e6454a48 0000000000100100 0000000000200200 c0000000005d8824 
GPR28: c000000000afba18 c000000000996730 c0000000009cb7f0 c00000000fffbb10 
NIP [c0000000000b29fc] .trace_hardirqs_on+0x120/0x1b0
LR [c0000000000b29e0] .trace_hardirqs_on+0x104/0x1b0
Call Trace:
[c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
[c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
[c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
[c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
[c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
[c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
[c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
[c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
[c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
[c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
--- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
    LR = .cpu_idle+0x140/0x210
[c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
[c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
[c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
Instruction dump:
e92d01b0 80090954 2f800000 41be002c 481e61fd 60000000 2fa30000 419e0080 
e93e80c0 80090000 2f800000 409e0070 <0fe00000> 48000068 7fa3eb78 38800001 
scsi0 : IBM 572C Storage Adapter
scsi 0:3:0:0: Direct-Access     IBM-ESXS ST973402SS       B522 PQ: 0 ANSI: 5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep badness
  2008-07-24 19:23 lockdep badness Nathan Lynch
@ 2008-07-24 19:38 ` Nathan Lynch
  2008-07-24 22:47   ` Benjamin Herrenschmidt
  2008-07-24 22:44 ` Benjamin Herrenschmidt
  2008-07-25 12:59 ` Sebastien Dugue
  2 siblings, 1 reply; 7+ messages in thread
From: Nathan Lynch @ 2008-07-24 19:38 UTC (permalink / raw)
  To: linuxppc-dev

Nathan Lynch wrote:
> 
> A couple of stack traces below, the first is from benh's tree, the
> second is from 2.6.26.  The lockdep self-tests all pass at boot.

Sorry, should have pointed out the code that is warning more
specifically.


> RTAS daemon started
> RTAS: event: 295, Type: Dump Notification Event, Severity: 2
> ------------[ cut here ]------------
> Badness at kernel/lockdep.c:2719

check_flags():

        if (irqs_disabled_flags(flags)) {
                if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)) {
                        printk("possible reason: unannotated
                irqs-off.\n");
                }
        } else {
>>>             if (DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled)) {
                        printk("possible reason: unannotated
                irqs-on.\n");
                }
        }

> NIP: c0000000000b06dc LR: c0000000000b06c0 CTR: 0000000000000000
> REGS: c0000000e787b9a0 TRAP: 0700   Not tainted  (2.6.26)
> MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000042  XER: 20000000
> TASK = c0000000e91a6100[428] 'rtasd' THREAD: c0000000e7878000 CPU: 1
> GPR00: 0000000000000000 c0000000e787bc20 c000000000a71270 0000000000000001 
> GPR04: 0000000000000001 c00000000004d648 0000040100000000 0000040100000000 
> GPR08: 0000000000000000 c000000000f1f858 0000000000000020 0000000000000001 
> GPR12: 0000000000010000 c000000000aac600 00000000002146f4 0000000000214744 
> GPR16: 4000000002040000 00000000029015b0 00000000002a3c00 c0000000008cc988 
> GPR20: 0000000002901598 0000000000000000 0000000000000124 0000000000000002 
> GPR24: 0000000000000001 0000000000000008 0000000000000001 c0000000009fdf00 
> GPR28: c00000000004d648 c000000000943e98 c0000000009d9b40 c0000000e787bc20 
> NIP [c0000000000b06dc] .check_flags+0x9c/0x174
> LR [c0000000000b06c0] .check_flags+0x80/0x174
> Call Trace:
> [c0000000e787bc20] [c0000000e787bc70] 0xc0000000e787bc70 (unreliable)
> [c0000000e787bca0] [c0000000000b5ac8] .lock_release+0x7c/0x208
> [c0000000e787bd50] [c0000000005e12c0] ._spin_unlock_irqrestore+0x34/0x94
> [c0000000e787bde0] [c00000000004d648] .pSeries_log_error+0x380/0x3f0
> [c0000000e787bef0] [c00000000004d8e4] .rtasd+0x98/0x100
> [c0000000e787bf90] [c000000000029d20] .kernel_thread+0x4c/0x68
> Instruction dump:
> e92d01b0 80090904 2f800000 40be002c 481ece4d 60000000 2fa30000 419e00c4 
> e93e80c0 80090000 2f800000 409e00b4 <0fe00000> 480000ac 78290464 80090014 
> possible reason: unannotated irqs-on.
> irq event stamp: 20
> hardirqs last  enabled at (19): [<c0000000000b405c>] .trace_hardirqs_on+0x1c/0x3
> 0
> hardirqs last disabled at (20): [<c0000000000b0d80>] .trace_hardirqs_off+0x1c/0x
> 30
> softirqs last  enabled at (0): [<c00000000008293c>] .copy_process+0x3c8/0x1040
> softirqs last disabled at (0): [<0000000000000000>] 0x0
> 
> ===============
> 
> ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
> ipr 0000:00:01.0: Found IOA with IRQ: 289
> ipr 0000:00:01.0: Starting IOA initialization sequence.
> ipr 0000:00:01.0: Adapter firmware version: 02200023
> ipr 0000:00:01.0: IOA initialized.
> ------------[ cut here ]------------
> Badness at kernel/lockdep.c:2037

trace_hardirqs_on():

        if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
                return;
>>>     if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
                return;


> NIP: c0000000000b29fc LR: c0000000000b29e0 CTR: 0000000000000000
> REGS: c00000000fffb890 TRAP: 0700   Not tainted  (2.6.26)
> MSR: 8000000000021032 <ME,IR,DR>  CR: 28000022  XER: 00000004
> TASK = c000000000996730[0] 'swapper' THREAD: c000000000a60000 CPU: 0
> GPR00: 0000000000000000 c00000000fffbb10 c000000000a61ad8 0000000000000001 
> GPR04: 0000000000035112 c000000000426708 0000000000000000 0000000000000970 
> GPR08: 0000000000000000 c000000000ec6ad0 0000000000000001 0000000000000001 
> GPR12: 0000000000043bd8 c000000000a9d400 00000000002146f4 0000000000214744 
> GPR16: 4000000002040000 00000000028f37a8 00000000002a3c00 c0000000008b3790 
> GPR20: 00000000028f3790 0000000001a3f920 c0000000008b37a8 c0000000e6252000 
> GPR24: c0000000e6454a48 0000000000100100 0000000000200200 c0000000005d8824 
> GPR28: c000000000afba18 c000000000996730 c0000000009cb7f0 c00000000fffbb10 
> NIP [c0000000000b29fc] .trace_hardirqs_on+0x120/0x1b0
> LR [c0000000000b29e0] .trace_hardirqs_on+0x104/0x1b0
> Call Trace:
> [c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
> [c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
> [c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
> [c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
> [c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
> [c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
> [c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
> [c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
> [c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
> [c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
> --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
>     LR = .cpu_idle+0x140/0x210
> [c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
> [c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
> [c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
> Instruction dump:
> e92d01b0 80090954 2f800000 41be002c 481e61fd 60000000 2fa30000 419e0080 
> e93e80c0 80090000 2f800000 409e0070 <0fe00000> 48000068 7fa3eb78 38800001 
> scsi0 : IBM 572C Storage Adapter
> scsi 0:3:0:0: Direct-Access     IBM-ESXS ST973402SS       B522 PQ: 0 ANSI: 5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep badness
  2008-07-24 19:23 lockdep badness Nathan Lynch
  2008-07-24 19:38 ` Nathan Lynch
@ 2008-07-24 22:44 ` Benjamin Herrenschmidt
  2008-07-24 23:00   ` Nathan Lynch
  2008-07-24 23:04   ` Arnd Bergmann
  2008-07-25 12:59 ` Sebastien Dugue
  2 siblings, 2 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-24 22:44 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev

On Thu, 2008-07-24 at 14:23 -0500, Nathan Lynch wrote:
> I'm seeing warnings from the lockdep code itself in recent kernels on
> a Power6 blade (v2.6.26 and benh's -next branch).
> 
> Something to do with powerpc's "lazy" interrupt-disabling, perhaps?
> 
> A couple of stack traces below, the first is from benh's tree, the
> second is from 2.6.26.  The lockdep self-tests all pass at boot.

Interesting.

> [c0000000e787bc20] [c0000000e787bc70] 0xc0000000e787bc70 (unreliable)
> [c0000000e787bca0] [c0000000000b5ac8] .lock_release+0x7c/0x208
> [c0000000e787bd50] [c0000000005e12c0] ._spin_unlock_irqrestore+0x34/0x94
> [c0000000e787bde0] [c00000000004d648] .pSeries_log_error+0x380/0x3f0
> [c0000000e787bef0] [c00000000004d8e4] .rtasd+0x98/0x100
> [c0000000e787bf90] [c000000000029d20] .kernel_thread+0x4c/0x68
> Instruction dump:

This one is one I haven't managed to reproduce and didn't quite find out
what could be causing it, but it was already reported by Badari (and in
fact is referenced as a regression in Rafael list).

> Call Trace:
> [c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
> [c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
> [c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
> [c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
> [c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
> [c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
> [c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
> [c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
> [c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
> [c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
> --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
>     LR = .cpu_idle+0x140/0x210
> [c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
> [c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
> [c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
> Instruction dump:

This one is new to me. I will have a look. What machine is this ?

I suspect the error is to do spin_lock/unlock_irq rather than
save/restore variants at IRQ time, which would be an IPR bug... or
rather something legal that Ingo decided shouldn't be anymore :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep badness
  2008-07-24 19:38 ` Nathan Lynch
@ 2008-07-24 22:47   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-24 22:47 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev

On Thu, 2008-07-24 at 14:38 -0500, Nathan Lynch wrote:
> Nathan Lynch wrote:
> > 
> > A couple of stack traces below, the first is from benh's tree, the
> > second is from 2.6.26.  The lockdep self-tests all pass at boot.
> 
> Sorry, should have pointed out the code that is warning more
> specifically.
> 
> 
> > RTAS daemon started
> > RTAS: event: 295, Type: Dump Notification Event, Severity: 2
> > ------------[ cut here ]------------
> > Badness at kernel/lockdep.c:2719
> 
> check_flags():
> 
>         if (irqs_disabled_flags(flags)) {
>                 if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)) {
>                         printk("possible reason: unannotated
>                 irqs-off.\n");
>                 }
>         } else {
> >>>             if (DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled)) {
>                         printk("possible reason: unannotated
>                 irqs-on.\n");
>                 }
>         }

Yes, it looks like at some point the IRQ state gets out of sync, and I
haven't found out where yet. Having a repro case would help, what is the
setup of your machine ? I seems to be related to having an RTAS event at
boot.


> > ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
> > ipr 0000:00:01.0: Found IOA with IRQ: 289
> > ipr 0000:00:01.0: Starting IOA initialization sequence.
> > ipr 0000:00:01.0: Adapter firmware version: 02200023
> > ipr 0000:00:01.0: IOA initialized.
> > ------------[ cut here ]------------
> > Badness at kernel/lockdep.c:2037
> 
> trace_hardirqs_on():
> 
>         if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>                 return;
> >>>     if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
>                 return;

As I said before, I'm not 100% sure what that is, it smells like
something legal that Ingo made not so anymore but I need to double
check.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep badness
  2008-07-24 22:44 ` Benjamin Herrenschmidt
@ 2008-07-24 23:00   ` Nathan Lynch
  2008-07-24 23:04   ` Arnd Bergmann
  1 sibling, 0 replies; 7+ messages in thread
From: Nathan Lynch @ 2008-07-24 23:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Benjamin Herrenschmidt wrote:
> On Thu, 2008-07-24 at 14:23 -0500, Nathan Lynch wrote:
> > I'm seeing warnings from the lockdep code itself in recent kernels on
> > a Power6 blade (v2.6.26 and benh's -next branch).
> > 
> > Something to do with powerpc's "lazy" interrupt-disabling, perhaps?
> > 
> > A couple of stack traces below, the first is from benh's tree, the
> > second is from 2.6.26.  The lockdep self-tests all pass at boot.
> 
> Interesting.
> 
> > [c0000000e787bc20] [c0000000e787bc70] 0xc0000000e787bc70 (unreliable)
> > [c0000000e787bca0] [c0000000000b5ac8] .lock_release+0x7c/0x208
> > [c0000000e787bd50] [c0000000005e12c0] ._spin_unlock_irqrestore+0x34/0x94
> > [c0000000e787bde0] [c00000000004d648] .pSeries_log_error+0x380/0x3f0
> > [c0000000e787bef0] [c00000000004d8e4] .rtasd+0x98/0x100
> > [c0000000e787bf90] [c000000000029d20] .kernel_thread+0x4c/0x68
> > Instruction dump:
> 
> This one is one I haven't managed to reproduce and didn't quite find out
> what could be causing it, but it was already reported by Badari (and in
> fact is referenced as a regression in Rafael list).

Okay.



> > Call Trace:
> > [c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
> > [c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
> > [c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
> > [c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
> > [c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
> > [c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
> > [c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
> > [c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
> > [c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
> > [c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
> > --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
> >     LR = .cpu_idle+0x140/0x210
> > [c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
> > [c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
> > [c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
> > Instruction dump:
> 
> This one is new to me. I will have a look. What machine is this ?

Power6 blade - JS22 (four cores), with single disk attached via IPR,
HEA for network... nothing exotic, I guess.  Not sure how recreatable
the ipr trace is, I've only seen it once (and with 2.6.26 only).  The
rtasd trace is pretty consistent on powerpc/next.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep badness
  2008-07-24 22:44 ` Benjamin Herrenschmidt
  2008-07-24 23:00   ` Nathan Lynch
@ 2008-07-24 23:04   ` Arnd Bergmann
  1 sibling, 0 replies; 7+ messages in thread
From: Arnd Bergmann @ 2008-07-24 23:04 UTC (permalink / raw)
  To: linuxppc-dev, benh; +Cc: Brian King, Nathan Lynch

On Friday 25 July 2008, Benjamin Herrenschmidt wrote:
> > Call Trace:
> > [c00000000fffbb10] [c00000000fffbbb0] 0xc00000000fffbbb0 (unreliable)
> > [c00000000fffbbb0] [c0000000005d8824] ._spin_unlock_irq+0x40/0x68
> > [c00000000fffbc40] [c000000000426708] .ipr_ioa_reset_done+0x218/0x2ac
> > [c00000000fffbd00] [c00000000041bdb8] .ipr_reset_ioa_job+0xc8/0xf4
> > [c00000000fffbd90] [c000000000424ffc] .ipr_isr+0x280/0x628
> > [c00000000fffbe50] [c0000000000ccc70] .handle_IRQ_event+0x58/0xd4
> > [c00000000fffbef0] [c0000000000cef4c] .handle_fasteoi_irq+0x128/0x1c8
> > [c00000000fffbf90] [c000000000029918] .call_handle_irq+0x1c/0x2c
> > [c000000000a63a20] [c00000000000d9cc] .do_IRQ+0x138/0x248
> > [c000000000a63ad0] [c000000000004ca8] hardware_interrupt_entry+0x28/0x2c
> > --- Exception: 501 at .raw_local_irq_restore+0x8c/0xa4
> > =A0 =A0 LR =3D .cpu_idle+0x140/0x210
> > [c000000000a63e60] [c0000000005da07c] .rest_init+0x7c/0x98
> > [c000000000a63ee0] [c000000000866f10] .start_kernel+0x488/0x4b0
> > [c000000000a63f90] [c000000000008584] .start_here_common+0x4c/0xc8
> > Instruction dump:
>=20
> This one is new to me. I will have a look. What machine is this ?
>=20
> I suspect the error is to do spin_lock/unlock_irq rather than
> save/restore variants at IRQ time, which would be an IPR bug... or
> rather something legal that Ingo decided shouldn't be anymore :-)

Almost: The ipr_ioa_reset_done function does

	spin_unlock_irq(ioa_cfg->host->host_lock);
	scsi_unblock_requests(ioa_cfg->host);
	spin_lock_irq(ioa_cfg->host->host_lock);

It seem that it is always called with interrupts disabled and
ioa_cfg->host->host_lock held (otherwise we would get a different
warning), so to get rid of the lockdep warning, it should be
replaced with

	spin_unlock(ioa_cfg->host->host_lock);
	scsi_unblock_requests(ioa_cfg->host);
	spin_lock(ioa_cfg->host->host_lock);

I.e. leave the interrupts off, but still give up the lock. It
still feels wrong to do this, but I don't understand much about
the driver either.

	Arnd <><

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep badness
  2008-07-24 19:23 lockdep badness Nathan Lynch
  2008-07-24 19:38 ` Nathan Lynch
  2008-07-24 22:44 ` Benjamin Herrenschmidt
@ 2008-07-25 12:59 ` Sebastien Dugue
  2 siblings, 0 replies; 7+ messages in thread
From: Sebastien Dugue @ 2008-07-25 12:59 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev, Arnd Bergmann


  Hi,

On Thu, 24 Jul 2008 14:23:00 -0500 Nathan Lynch <ntl@pobox.com> wrote:

> I'm seeing warnings from the lockdep code itself in recent kernels on
> a Power6 blade (v2.6.26 and benh's -next branch).
> 
> Something to do with powerpc's "lazy" interrupt-disabling, perhaps?
> 
> A couple of stack traces below, the first is from benh's tree, the
> second is from 2.6.26.  The lockdep self-tests all pass at boot.
> 
> RTAS daemon started
> RTAS: event: 295, Type: Dump Notification Event, Severity: 2
> ------------[ cut here ]------------
> Badness at kernel/lockdep.c:2719

<snip>

  I'm observing nearly the same thing (minus the ipr related badness) here with
Ben's powerpc.git (master) on a JS22 blade:

Freeing initrd memory: 2676k freed
RTAS daemon started
RTAS: event: 94, Type: Platform Information Event, Severity: 1
------------[ cut here ]------------
Badness at kernel/lockdep.c:2079
NIP: c0000000000846dc LR: c0000000000846c0 CTR: 0000000000000000
REGS: c0000001e353f9c0 TRAP: 0700   Not tainted  (2.6.26)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000042  XER: 20000010
TASK = c0000000efa58640[115] 'rtasd' THREAD: c0000001e353c000 CPU: 3
GPR00: 0000000000000000 c0000001e353fc40 c0000000004efe50 0000000000000001 
GPR04: 0000000000000001 c00000000003cc94 0000000000000001 636820202d203230 
GPR08: 3465303000000000 c0000000009627a8 c0000000efa58e88 0000000000000001 
GPR12: 3465303038200d0a c000000000525900 c000000000395ba0 4000000000c00000 
GPR16: c0000000003943c8 0000000000000000 000000000029d000 000000000101c920 
GPR20: 0000000000000000 0000000000000001 0000000000000744 0000000000000001 
GPR24: c000000000538cf0 c0000001e3e74000 c00000000041b48c c0000000004c0060 
GPR28: c0000000000847a4 c0000000efa58640 c0000000004b18d0 c0000001e353fc40 
NIP [c0000000000846dc] .trace_hardirqs_on_caller+0xc4/0x170
LR [c0000000000846c0] .trace_hardirqs_on_caller+0xa8/0x170
Call Trace:
[c0000001e353fcd0] [c0000000000847a4] .trace_hardirqs_on+0x1c/0x30
[c0000001e353fd50] [c0000000002df554] ._spin_unlock_irqrestore+0x68/0xb8
[c0000001e353fde0] [c00000000003cc94] .pSeries_log_error+0x3a4/0x400
[c0000001e353fef0] [c00000000003cf28] .rtasd+0x98/0x100
[c0000001e353ff90] [c0000000000295f8] .kernel_thread+0x4c/0x68
Instruction dump:
901d07ec 880d01ca 2fa00000 41be002c 481300a1 60000000 2fa30000 419e00a0 
e93e80f8 80090000 2f800000 409e0090 <0fe00000> 48000088 e92d01a0 8009082c 
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
msgmni has been set to 15071
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)

 ...


  Sebastien.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-07-25 12:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-24 19:23 lockdep badness Nathan Lynch
2008-07-24 19:38 ` Nathan Lynch
2008-07-24 22:47   ` Benjamin Herrenschmidt
2008-07-24 22:44 ` Benjamin Herrenschmidt
2008-07-24 23:00   ` Nathan Lynch
2008-07-24 23:04   ` Arnd Bergmann
2008-07-25 12:59 ` Sebastien Dugue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).