cciss: WARNING/BUG in do_cciss

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* cciss: WARNING/BUG in do_cciss_intr (it's back)
@ 2009-02-04 17:45 Randy Dunlap
  2009-02-05 18:16 ` Miller, Mike (OS Dev)
  0 siblings, 1 reply; 7+ messages in thread
From: Randy Dunlap @ 2009-02-04 17:45 UTC (permalink / raw)
  To: Miller, Mike (OS Dev), iss_storagedev, scsi,
	Linux Kernel Mailing List

Hi Mike,

Was there any debugging code added to try to help with this problem?
or is that the WARNING before the BUG?


Booting 2.6.29-rc3-git6 oopsed with:

calling  cciss_init+0x0/0x2e [cciss] @ 733
HP CISS Driver (v 3.6.20)
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54
cciss 0000:42:08.0: PCI INT A -> Link[LNKA] -> GSI 54 (level, high) -> IRQ 54
cciss 0000:42:08.0: irq 56 for MSI/MSI-X
IRQ 56/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 56 using DAC
------------[ cut here ]------------
WARNING: at drivers/block/cciss.c:225 do_cciss_intr+0x58f/0x99a [cciss]()
Hardware name: ProLiant BL685c G1
Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.29-rc3-git6 #1
Call Trace:
 <IRQ>  [<ffffffff8023a741>] warn_slowpath+0xd3/0xf2
 [<ffffffff80243a44>] ? __mod_timer+0xc1/0xd3
 [<ffffffff8041469f>] ? smi_timeout+0xd9/0xe5
 [<ffffffff8024f86a>] ? ktime_get_ts+0x49/0x4e
 [<ffffffff804145c6>] ? smi_timeout+0x0/0xe5
 [<ffffffffa0024c4b>] do_cciss_intr+0x58f/0x99a [cciss]
 [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57
 [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f
 [<ffffffff8020e302>] do_IRQ+0xdc/0x152
 [<ffffffff8020ca13>] ret_from_intr+0x0/0xa
 <EOI> <4>---[ end trace a8b437cd48391e28 ]---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000f4
IP: [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss]
PGD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/block/ram15/dev
CPU 2
Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Tainted: G        W  2.6.29-rc3-git6 #1
RIP: 0010:[<ffffffffa0024c93>]  [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss]
RSP: 0018:ffff88027f12fef0  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88007f840270 RCX: 0000000000013888
RDX: 0000000000008080 RSI: 0000000000000046 RDI: 0000000000000009
RBP: ffff88027f12ff20 R08: 000000447f12fa70 R09: ffff88017e540700
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007f8404b0
R13: ffff88027e1a0000 R14: 0000000000000000 R15: 0000000000000086
FS:  0000000000680850(0000) GS:ffff88017f121380(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000f4 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88017f164000, task ffff88017fa5d4c0)
Stack:
 0000000000000001 ffff88027f126280 0000000000000000 0000000000000000
 0000000000000038 0000000000000000 ffff88027f12ff50 ffffffff8026ed21
 ffffffff8076e000 0000000000000038 ffff88027f126280 ffffffff8076e054
Call Trace:
 <IRQ> <0> [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57
 [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f
 [<ffffffff8020e302>] do_IRQ+0xdc/0x152
 [<ffffffff8020ca13>] ret_from_intr+0x0/0xa
 <EOI> <0>Code: 50 08 48 c7 83 40 02 00 00 00 00 00 00 49 c7 44 24 08 00 00 00 00 8b 83 34 02 00 00 85 c0 0f 85 49 03 00 00 4c 8b b3 50 02 00 00 <41> c7 86 f4 00 00 00 00 00 00 00 4c 8b 83 28 02 00 00 66 41 8b
RIP  [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss]
 RSP <ffff88027f12fef0>
CR2: 00000000000000f4
---[ end trace a8b437cd48391e29 ]---
Kernel panic - not syncing: Fatal exception in interrupt



This is on an HP ProLiant BL685c G1, 4-proc system with
8 GB of RAM.  (same as previous reports)


Rebooting worked successfully.

Thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: cciss: WARNING/BUG in do_cciss_intr (it's back)
  2009-02-04 17:45 cciss: WARNING/BUG in do_cciss_intr (it's back) Randy Dunlap
@ 2009-02-05 18:16 ` Miller, Mike (OS Dev)
  2009-02-06  7:44   ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Miller, Mike (OS Dev) @ 2009-02-05 18:16 UTC (permalink / raw)
  To: Randy Dunlap, ISS StorageDev, scsi, Linux Kernel Mailing List,
	Jens

 

> -----Original Message-----
> From: Randy Dunlap [mailto:randy.dunlap@oracle.com] 
> Sent: Wednesday, February 04, 2009 11:45 AM
> To: Miller, Mike (OS Dev); ISS StorageDev; scsi; Linux Kernel 
> Mailing List
> Subject: cciss: WARNING/BUG in do_cciss_intr (it's back)
> 
> Hi Mike,
> 
> Was there any debugging code added to try to help with this problem?
> or is that the WARNING before the BUG?
> 

Randy,
I think this is a different bug than the one you reported previously. Please open a new bugzilla.

Thanks,
-- mikem

> 
> Booting 2.6.29-rc3-git6 oopsed with:
> 
> calling  cciss_init+0x0/0x2e [cciss] @ 733 HP CISS Driver (v 3.6.20)
> ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54 cciss 
> 0000:42:08.0: PCI INT A -> Link[LNKA] -> GSI 54 (level, high) 
> -> IRQ 54 cciss 0000:42:08.0: irq 56 for MSI/MSI-X IRQ 
> 56/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
> cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 56 using DAC 
> ------------[ cut here ]------------
> WARNING: at drivers/block/cciss.c:225 
> do_cciss_intr+0x58f/0x99a [cciss]() Hardware name: ProLiant 
> BL685c G1 Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
> Pid: 0, comm: swapper Not tainted 2.6.29-rc3-git6 #1 Call Trace:
>  <IRQ>  [<ffffffff8023a741>] warn_slowpath+0xd3/0xf2  
> [<ffffffff80243a44>] ? __mod_timer+0xc1/0xd3  
> [<ffffffff8041469f>] ? smi_timeout+0xd9/0xe5  
> [<ffffffff8024f86a>] ? ktime_get_ts+0x49/0x4e  
> [<ffffffff804145c6>] ? smi_timeout+0x0/0xe5  
> [<ffffffffa0024c4b>] do_cciss_intr+0x58f/0x99a [cciss]  
> [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57  
> [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f  
> [<ffffffff8020e302>] do_IRQ+0xdc/0x152  [<ffffffff8020ca13>] 
> ret_from_intr+0x0/0xa  <EOI> <4>---[ end trace a8b437cd48391e28 ]---
> BUG: unable to handle kernel NULL pointer dereference at 
> 00000000000000f4
> IP: [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss] PGD 0
> Oops: 0002 [#1] SMP
> last sysfs file: /sys/block/ram15/dev
> CPU 2
> Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
> Pid: 0, comm: swapper Tainted: G        W  2.6.29-rc3-git6 #1
> RIP: 0010:[<ffffffffa0024c93>]  [<ffffffffa0024c93>] 
> do_cciss_intr+0x5d7/0x99a [cciss]
> RSP: 0018:ffff88027f12fef0  EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff88007f840270 RCX: 0000000000013888
> RDX: 0000000000008080 RSI: 0000000000000046 RDI: 0000000000000009
> RBP: ffff88027f12ff20 R08: 000000447f12fa70 R09: ffff88017e540700
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007f8404b0
> R13: ffff88027e1a0000 R14: 0000000000000000 R15: 0000000000000086
> FS:  0000000000680850(0000) GS:ffff88017f121380(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000000000f4 CR3: 0000000000201000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400 Process swapper (pid: 0, threadinfo 
> ffff88017f164000, task ffff88017fa5d4c0)
> Stack:
>  0000000000000001 ffff88027f126280 0000000000000000 0000000000000000
>  0000000000000038 0000000000000000 ffff88027f12ff50 
> ffffffff8026ed21  ffffffff8076e000 0000000000000038 
> ffff88027f126280 ffffffff8076e054 Call Trace:
>  <IRQ> <0> [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57  
> [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f  
> [<ffffffff8020e302>] do_IRQ+0xdc/0x152  [<ffffffff8020ca13>] 
> ret_from_intr+0x0/0xa  <EOI> <0>Code: 50 08 48 c7 83 40 02 00 
> 00 00 00 00 00 49 c7 44 24 08 00 00 00 00 8b 83 34 02 00 00 
> 85 c0 0f 85 49 03 00 00 4c 8b b3 50 02 00 00 <41> c7 86 f4 00 
> 00 00 00 00 00 00 4c 8b 83 28 02 00 00 66 41 8b RIP  
> [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss]  RSP 
> <ffff88027f12fef0>
> CR2: 00000000000000f4
> ---[ end trace a8b437cd48391e29 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> 
> This is on an HP ProLiant BL685c G1, 4-proc system with
> 8 GB of RAM.  (same as previous reports)
> 
> 
> Rebooting worked successfully.
> 
> Thanks,
> --
> ~Randy
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cciss: WARNING/BUG in do_cciss_intr (it's back)
  2009-02-05 18:16 ` Miller, Mike (OS Dev)
@ 2009-02-06  7:44   ` Jens Axboe
  2009-02-06 16:09     ` Miller, Mike (OS Dev)
  2009-02-06 16:34     ` Randy Dunlap
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2009-02-06  7:44 UTC (permalink / raw)
  To: Miller, Mike (OS Dev)
  Cc: Randy Dunlap, ISS StorageDev, scsi, Linux Kernel Mailing List,
	James Bottomley

On Thu, Feb 05 2009, Miller, Mike (OS Dev) wrote:
>  
> 
> > -----Original Message-----
> > From: Randy Dunlap [mailto:randy.dunlap@oracle.com] 
> > Sent: Wednesday, February 04, 2009 11:45 AM
> > To: Miller, Mike (OS Dev); ISS StorageDev; scsi; Linux Kernel 
> > Mailing List
> > Subject: cciss: WARNING/BUG in do_cciss_intr (it's back)
> > 
> > Hi Mike,
> > 
> > Was there any debugging code added to try to help with this problem?
> > or is that the WARNING before the BUG?
> > 
> 
> Randy,
> I think this is a different bug than the one you reported previously.
> Please open a new bugzilla.

I think it's the same one. The first warning that now triggers is:

WARNING: at drivers/block/cciss.c:225 

which is

        if (WARN_ON(hlist_unhashed(&c->list)))

removeQ(), this is where we would have crashed before due to trying to
remove a command from a list it didn't belong to. And then we crash
right after in the interrupt handler. So I'm pretty sure this is 100%
the same bug.

Randy, is this still using kexec? Perhaps cciss needs a better
kick-in-the-pants reset on driver load to clear EVERYTHING, there's
clearly something very bad happening there.


> 
> Thanks,
> -- mikem
> 
> > 
> > Booting 2.6.29-rc3-git6 oopsed with:
> > 
> > calling  cciss_init+0x0/0x2e [cciss] @ 733 HP CISS Driver (v 3.6.20)
> > ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54 cciss 
> > 0000:42:08.0: PCI INT A -> Link[LNKA] -> GSI 54 (level, high) 
> > -> IRQ 54 cciss 0000:42:08.0: irq 56 for MSI/MSI-X IRQ 
> > 56/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
> > cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 56 using DAC 
> > ------------[ cut here ]------------
> > WARNING: at drivers/block/cciss.c:225 
> > do_cciss_intr+0x58f/0x99a [cciss]() Hardware name: ProLiant 
> > BL685c G1 Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
> > Pid: 0, comm: swapper Not tainted 2.6.29-rc3-git6 #1 Call Trace:
> >  <IRQ>  [<ffffffff8023a741>] warn_slowpath+0xd3/0xf2  
> > [<ffffffff80243a44>] ? __mod_timer+0xc1/0xd3  
> > [<ffffffff8041469f>] ? smi_timeout+0xd9/0xe5  
> > [<ffffffff8024f86a>] ? ktime_get_ts+0x49/0x4e  
> > [<ffffffff804145c6>] ? smi_timeout+0x0/0xe5  
> > [<ffffffffa0024c4b>] do_cciss_intr+0x58f/0x99a [cciss]  
> > [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57  
> > [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f  
> > [<ffffffff8020e302>] do_IRQ+0xdc/0x152  [<ffffffff8020ca13>] 
> > ret_from_intr+0x0/0xa  <EOI> <4>---[ end trace a8b437cd48391e28 ]---
> > BUG: unable to handle kernel NULL pointer dereference at 
> > 00000000000000f4
> > IP: [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss] PGD 0
> > Oops: 0002 [#1] SMP
> > last sysfs file: /sys/block/ram15/dev
> > CPU 2
> > Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
> > Pid: 0, comm: swapper Tainted: G        W  2.6.29-rc3-git6 #1
> > RIP: 0010:[<ffffffffa0024c93>]  [<ffffffffa0024c93>] 
> > do_cciss_intr+0x5d7/0x99a [cciss]
> > RSP: 0018:ffff88027f12fef0  EFLAGS: 00010046
> > RAX: 0000000000000000 RBX: ffff88007f840270 RCX: 0000000000013888
> > RDX: 0000000000008080 RSI: 0000000000000046 RDI: 0000000000000009
> > RBP: ffff88027f12ff20 R08: 000000447f12fa70 R09: ffff88017e540700
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007f8404b0
> > R13: ffff88027e1a0000 R14: 0000000000000000 R15: 0000000000000086
> > FS:  0000000000680850(0000) GS:ffff88017f121380(0000) 
> > knlGS:0000000000000000
> > CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > CR2: 00000000000000f4 CR3: 0000000000201000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> > 0000000000000400 Process swapper (pid: 0, threadinfo 
> > ffff88017f164000, task ffff88017fa5d4c0)
> > Stack:
> >  0000000000000001 ffff88027f126280 0000000000000000 0000000000000000
> >  0000000000000038 0000000000000000 ffff88027f12ff50 
> > ffffffff8026ed21  ffffffff8076e000 0000000000000038 
> > ffff88027f126280 ffffffff8076e054 Call Trace:
> >  <IRQ> <0> [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57  
> > [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f  
> > [<ffffffff8020e302>] do_IRQ+0xdc/0x152  [<ffffffff8020ca13>] 
> > ret_from_intr+0x0/0xa  <EOI> <0>Code: 50 08 48 c7 83 40 02 00 
> > 00 00 00 00 00 49 c7 44 24 08 00 00 00 00 8b 83 34 02 00 00 
> > 85 c0 0f 85 49 03 00 00 4c 8b b3 50 02 00 00 <41> c7 86 f4 00 
> > 00 00 00 00 00 00 4c 8b 83 28 02 00 00 66 41 8b RIP  
> > [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss]  RSP 
> > <ffff88027f12fef0>
> > CR2: 00000000000000f4
> > ---[ end trace a8b437cd48391e29 ]---
> > Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > 
> > 
> > This is on an HP ProLiant BL685c G1, 4-proc system with
> > 8 GB of RAM.  (same as previous reports)
> > 
> > 
> > Rebooting worked successfully.
> > 
> > Thanks,
> > --
> > ~Randy
> > 
-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: cciss: WARNING/BUG in do_cciss_intr (it's back)
  2009-02-06  7:44   ` Jens Axboe
@ 2009-02-06 16:09     ` Miller, Mike (OS Dev)
  2009-02-06 16:34     ` Randy Dunlap
  1 sibling, 0 replies; 7+ messages in thread
From: Miller, Mike (OS Dev) @ 2009-02-06 16:09 UTC (permalink / raw)
  To: Jens Axboe, Andrew Morton
  Cc: Randy Dunlap, ISS StorageDev, scsi, Linux Kernel Mailing List,
	James Bottomley

Jens wrote: 

> 
> I think it's the same one. The first warning that now triggers is:
> 
> WARNING: at drivers/block/cciss.c:225 
> 
> which is
> 
>         if (WARN_ON(hlist_unhashed(&c->list)))
> 
> removeQ(), this is where we would have crashed before due to 
> trying to remove a command from a list it didn't belong to. 
> And then we crash right after in the interrupt handler. So 
> I'm pretty sure this is 100% the same bug.
> 
> Randy, is this still using kexec? Perhaps cciss needs a 
> better kick-in-the-pants reset on driver load to clear 
> EVERYTHING, there's clearly something very bad happening there.
> 

I have some code that does a PCI PM reset on the controller. That's one way to ensure the controller gets sane again. Let me port it to 2.6.29-rc and I'll submit the patch.

Thanks,
-- mikem

> > 
> > > 
> > > Booting 2.6.29-rc3-git6 oopsed with:
> > > 
> > > calling  cciss_init+0x0/0x2e [cciss] @ 733 HP CISS Driver 
> (v 3.6.20)
> > > ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54 cciss
> > > 0000:42:08.0: PCI INT A -> Link[LNKA] -> GSI 54 (level, high)
> > > -> IRQ 54 cciss 0000:42:08.0: irq 56 for MSI/MSI-X IRQ
> > > 56/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
> > > cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 56 using DAC 
> ------------[ 
> > > cut here ]------------
> > > WARNING: at drivers/block/cciss.c:225 do_cciss_intr+0x58f/0x99a 
> > > [cciss]() Hardware name: ProLiant BL685c G1 Modules linked in: 
> > > cciss(+) ehci_hcd ohci_hcd uhci_hcd
> > > Pid: 0, comm: swapper Not tainted 2.6.29-rc3-git6 #1 Call Trace:
> > >  <IRQ>  [<ffffffff8023a741>] warn_slowpath+0xd3/0xf2 
> > > [<ffffffff80243a44>] ? __mod_timer+0xc1/0xd3 
> [<ffffffff8041469f>] ? 
> > > smi_timeout+0xd9/0xe5 [<ffffffff8024f86a>] ? 
> ktime_get_ts+0x49/0x4e 
> > > [<ffffffff804145c6>] ? smi_timeout+0x0/0xe5 [<ffffffffa0024c4b>] 
> > > do_cciss_intr+0x58f/0x99a [cciss] [<ffffffff8026ed21>] 
> > > handle_IRQ_event+0x27/0x57 [<ffffffff8027057d>] 
> > > handle_edge_irq+0xde/0x11f [<ffffffff8020e302>] 
> do_IRQ+0xdc/0x152  
> > > [<ffffffff8020ca13>] ret_from_intr+0x0/0xa  <EOI> <4>---[ 
> end trace 
> > > a8b437cd48391e28 ]---
> > > BUG: unable to handle kernel NULL pointer dereference at
> > > 00000000000000f4
> > > IP: [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss] PGD 0
> > > Oops: 0002 [#1] SMP
> > > last sysfs file: /sys/block/ram15/dev CPU 2 Modules linked in: 
> > > cciss(+) ehci_hcd ohci_hcd uhci_hcd
> > > Pid: 0, comm: swapper Tainted: G        W  2.6.29-rc3-git6 #1
> > > RIP: 0010:[<ffffffffa0024c93>]  [<ffffffffa0024c93>] 
> > > do_cciss_intr+0x5d7/0x99a [cciss]
> > > RSP: 0018:ffff88027f12fef0  EFLAGS: 00010046
> > > RAX: 0000000000000000 RBX: ffff88007f840270 RCX: 0000000000013888
> > > RDX: 0000000000008080 RSI: 0000000000000046 RDI: 0000000000000009
> > > RBP: ffff88027f12ff20 R08: 000000447f12fa70 R09: ffff88017e540700
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007f8404b0
> > > R13: ffff88027e1a0000 R14: 0000000000000000 R15: 0000000000000086
> > > FS:  0000000000680850(0000) GS:ffff88017f121380(0000) 
> > > knlGS:0000000000000000
> > > CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > > CR2: 00000000000000f4 CR3: 0000000000201000 CR4: 00000000000006e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> > > 0000000000000400 Process swapper (pid: 0, threadinfo 
> > > ffff88017f164000, task ffff88017fa5d4c0)
> > > Stack:
> > >  0000000000000001 ffff88027f126280 0000000000000000 
> 0000000000000000
> > >  0000000000000038 0000000000000000 ffff88027f12ff50
> > > ffffffff8026ed21  ffffffff8076e000 0000000000000038 
> ffff88027f126280 
> > > ffffffff8076e054 Call Trace:
> > >  <IRQ> <0> [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57 
> > > [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f 
> [<ffffffff8020e302>] 
> > > do_IRQ+0xdc/0x152  [<ffffffff8020ca13>] 
> ret_from_intr+0x0/0xa  <EOI> 
> > > <0>Code: 50 08 48 c7 83 40 02 00 00 00 00 00 00 49 c7 44 
> 24 08 00 00 
> > > 00 00 8b 83 34 02 00 00
> > > 85 c0 0f 85 49 03 00 00 4c 8b b3 50 02 00 00 <41> c7 86 
> f4 00 00 00 
> > > 00 00 00 00 4c 8b 83 28 02 00 00 66 41 8b RIP 
> [<ffffffffa0024c93>] 
> > > do_cciss_intr+0x5d7/0x99a [cciss]  RSP <ffff88027f12fef0>
> > > CR2: 00000000000000f4
> > > ---[ end trace a8b437cd48391e29 ]--- Kernel panic - not syncing: 
> > > Fatal exception in interrupt
> > > 
> > > 
> > > 
> > > This is on an HP ProLiant BL685c G1, 4-proc system with
> > > 8 GB of RAM.  (same as previous reports)
> > > 
> > > 
> > > Rebooting worked successfully.
> > > 
> > > Thanks,
> > > --
> > > ~Randy
> > > 
> --
> Jens Axboe
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cciss: WARNING/BUG in do_cciss_intr (it's back)
  2009-02-06  7:44   ` Jens Axboe
  2009-02-06 16:09     ` Miller, Mike (OS Dev)
@ 2009-02-06 16:34     ` Randy Dunlap
  2010-07-01 10:18       ` Bob Zhang
  1 sibling, 1 reply; 7+ messages in thread
From: Randy Dunlap @ 2009-02-06 16:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Miller, Mike (OS Dev), ISS StorageDev, scsi,
	Linux Kernel Mailing List, James Bottomley

Jens Axboe wrote:
> On Thu, Feb 05 2009, Miller, Mike (OS Dev) wrote:
>>  
>>
>>> -----Original Message-----
>>> From: Randy Dunlap [mailto:randy.dunlap@oracle.com] 
>>> Sent: Wednesday, February 04, 2009 11:45 AM
>>> To: Miller, Mike (OS Dev); ISS StorageDev; scsi; Linux Kernel 
>>> Mailing List
>>> Subject: cciss: WARNING/BUG in do_cciss_intr (it's back)
>>>
>>> Hi Mike,
>>>
>>> Was there any debugging code added to try to help with this problem?
>>> or is that the WARNING before the BUG?
>>>
>> Randy,
>> I think this is a different bug than the one you reported previously.
>> Please open a new bugzilla.
> 
> I think it's the same one. The first warning that now triggers is:
> 
> WARNING: at drivers/block/cciss.c:225 
> 
> which is
> 
>         if (WARN_ON(hlist_unhashed(&c->list)))
> 
> removeQ(), this is where we would have crashed before due to trying to
> remove a command from a list it didn't belong to. And then we crash
> right after in the interrupt handler. So I'm pretty sure this is 100%
> the same bug.

I agree, looks like the same bug to me also.

> Randy, is this still using kexec? Perhaps cciss needs a better
> kick-in-the-pants reset on driver load to clear EVERYTHING, there's
> clearly something very bad happening there.

Yes, still using kexec for loading/testing the new kernel.

> 
>> Thanks,
>> -- mikem
>>
>>> Booting 2.6.29-rc3-git6 oopsed with:
>>>
>>> calling  cciss_init+0x0/0x2e [cciss] @ 733 HP CISS Driver (v 3.6.20)
>>> ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54 cciss 
>>> 0000:42:08.0: PCI INT A -> Link[LNKA] -> GSI 54 (level, high) 
>>> -> IRQ 54 cciss 0000:42:08.0: irq 56 for MSI/MSI-X IRQ 
>>> 56/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
>>> cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 56 using DAC 
>>> ------------[ cut here ]------------
>>> WARNING: at drivers/block/cciss.c:225 
>>> do_cciss_intr+0x58f/0x99a [cciss]() Hardware name: ProLiant 
>>> BL685c G1 Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
>>> Pid: 0, comm: swapper Not tainted 2.6.29-rc3-git6 #1 Call Trace:
>>>  <IRQ>  [<ffffffff8023a741>] warn_slowpath+0xd3/0xf2  
>>> [<ffffffff80243a44>] ? __mod_timer+0xc1/0xd3  
>>> [<ffffffff8041469f>] ? smi_timeout+0xd9/0xe5  
>>> [<ffffffff8024f86a>] ? ktime_get_ts+0x49/0x4e  
>>> [<ffffffff804145c6>] ? smi_timeout+0x0/0xe5  
>>> [<ffffffffa0024c4b>] do_cciss_intr+0x58f/0x99a [cciss]  
>>> [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57  
>>> [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f  
>>> [<ffffffff8020e302>] do_IRQ+0xdc/0x152  [<ffffffff8020ca13>] 
>>> ret_from_intr+0x0/0xa  <EOI> <4>---[ end trace a8b437cd48391e28 ]---
>>> BUG: unable to handle kernel NULL pointer dereference at 
>>> 00000000000000f4
>>> IP: [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss] PGD 0
>>> Oops: 0002 [#1] SMP
>>> last sysfs file: /sys/block/ram15/dev
>>> CPU 2
>>> Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd
>>> Pid: 0, comm: swapper Tainted: G        W  2.6.29-rc3-git6 #1
>>> RIP: 0010:[<ffffffffa0024c93>]  [<ffffffffa0024c93>] 
>>> do_cciss_intr+0x5d7/0x99a [cciss]
>>> RSP: 0018:ffff88027f12fef0  EFLAGS: 00010046
>>> RAX: 0000000000000000 RBX: ffff88007f840270 RCX: 0000000000013888
>>> RDX: 0000000000008080 RSI: 0000000000000046 RDI: 0000000000000009
>>> RBP: ffff88027f12ff20 R08: 000000447f12fa70 R09: ffff88017e540700
>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007f8404b0
>>> R13: ffff88027e1a0000 R14: 0000000000000000 R15: 0000000000000086
>>> FS:  0000000000680850(0000) GS:ffff88017f121380(0000) 
>>> knlGS:0000000000000000
>>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>> CR2: 00000000000000f4 CR3: 0000000000201000 CR4: 00000000000006e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
>>> 0000000000000400 Process swapper (pid: 0, threadinfo 
>>> ffff88017f164000, task ffff88017fa5d4c0)
>>> Stack:
>>>  0000000000000001 ffff88027f126280 0000000000000000 0000000000000000
>>>  0000000000000038 0000000000000000 ffff88027f12ff50 
>>> ffffffff8026ed21  ffffffff8076e000 0000000000000038 
>>> ffff88027f126280 ffffffff8076e054 Call Trace:
>>>  <IRQ> <0> [<ffffffff8026ed21>] handle_IRQ_event+0x27/0x57  
>>> [<ffffffff8027057d>] handle_edge_irq+0xde/0x11f  
>>> [<ffffffff8020e302>] do_IRQ+0xdc/0x152  [<ffffffff8020ca13>] 
>>> ret_from_intr+0x0/0xa  <EOI> <0>Code: 50 08 48 c7 83 40 02 00 
>>> 00 00 00 00 00 49 c7 44 24 08 00 00 00 00 8b 83 34 02 00 00 
>>> 85 c0 0f 85 49 03 00 00 4c 8b b3 50 02 00 00 <41> c7 86 f4 00 
>>> 00 00 00 00 00 00 4c 8b 83 28 02 00 00 66 41 8b RIP  
>>> [<ffffffffa0024c93>] do_cciss_intr+0x5d7/0x99a [cciss]  RSP 
>>> <ffff88027f12fef0>
>>> CR2: 00000000000000f4
>>> ---[ end trace a8b437cd48391e29 ]---
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>
>>>
>>>
>>> This is on an HP ProLiant BL685c G1, 4-proc system with
>>> 8 GB of RAM.  (same as previous reports)
>>>
>>>
>>> Rebooting worked successfully.
>>>
>>> Thanks,
>>> --
>>> ~Randy


-- 
~Randy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cciss: WARNING/BUG in do_cciss_intr (it's back)
  2009-02-06 16:34     ` Randy Dunlap
@ 2010-07-01 10:18       ` Bob Zhang
  0 siblings, 0 replies; 7+ messages in thread
From: Bob Zhang @ 2010-07-01 10:18 UTC (permalink / raw)
  To: Randy Dunlap, bob_zhang2004, zhanglinbao
  Cc: Jens Axboe, Miller, Mike (OS Dev), ISS StorageDev, scsi,
	Linux Kernel Mailing List, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 497 bytes --]

Hi all,

I want to know the final result.
have you fixed this bug ?  if yes, how to fix ?
Now , I am using 2.6.32.12-7 from sles11SP1(ia64) , I still happened
this problem.


Any comments are welcome .

another point ,
>> Randy,
>> I think this is a different bug than the one you reported previously.
>> Please open a new bugzilla.
>
> I think it's the same one. The first warning that now triggers is:
>
Could you give me the previous one link ?

attachment is the booting information and eror.

[-- Attachment #2: booting_info.log --]
[-- Type: application/octet-stream, Size: 19317 bytes --]

1,0,2,0 5400006309E10000 0000000000000000 EVN_BOOT_START                        


***********************************************************


* ROM Version : 00.44


* ROM Date    : Sun May 16 18:19:10 PDT 2010


***********************************************************


1,1,2,0 3400083749E10000 000000000002000C EVN_BOOT_CELL_JOINED_PD               


1,0,2,0 340000B109E10000 0000003C0205000C EVN_MEM_DISCOVERY                     


1,1,2,0 340000B149E10000 0000007C0205000C EVN_MEM_DISCOVERY                     


1,0,2,0 Start memory test ......   0/100 


.......


1,0,2,0 Memory test progress....  33/100 


.......


1,0,2,0 Memory test progress....  66/100 


.......


1,0,2,0 Memory test progress.... 100/100 


1,0,2,0 1400002609E10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START          


1,1,2,0 1400002649E10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START          


1,0,3,0 140000260DE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START          


1,1,3,0 140000264DE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START          


1,0,3,1 140000260FE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START          


1,1,3,1 140000264FE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START          


1,1,2,1 140000264BE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START          


1,0,2,1 140000260BE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START          


1,0,2,0 5400020709E10000 000000000011000C EVN_EFI_START                         



Press Ctrl-C now to bypass loading option ROM UEFI drivers.


1,0,2,0 3400008109E10000 000000000007000C EVN_IO_DISCOVERY_START                


1,0,2,0 5400020B09E10000 0000000000000006 EVN_EFI_LAUNCH_BOOT_MANAGER           


(C) Copyright 1996-2010 Hewlett-Packard Development Company, L.P.


Note, menu interfaces might only display on the primary console device.

The current primary console device is:

Serial PcieRoot(0x30304352)/Pci(0x1C,0x5)/Pci(0x0,0x5)

The primary console can be changed via the 'conconfig' UEFI shell command.


Press:  ENTER  -  Start boot entry execution

        B / b  -  Launch Boot Manager (menu interface)

        D / d  -  Launch Device Manager (menu interface)

        M / m  -  Launch Boot Maintenance Manager (menu interface)

        S / s  -  Launch UEFI Shell (command line interface)

        I / i  -  Launch iLO Setup Tool (command line interface)


*** User input can now be provided ***



                                                                               
Automatic boot entry execution will start in 7 second(s).
                                                                               
Automatic boot entry execution will start in 6 second(s).
                                                                               
Automatic boot entry execution will start in 5 second(s).
                                                                               
Automatic boot entry execution will start in 4 second(s).
                                                                               
Automatic boot entry execution will start in 3 second(s).
                                                                               
Automatic boot entry execution will start in 2 second(s).
                                                                               
Automatic boot entry execution will start in 1 second(s).

HP Smart Array P410i Controller     (version 2.99)  1 Logical Drive
Booting SUSE Linux Enterprise Server 11


ELILO boot: Uncompressing Linux... |/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\done
Loading file initrd-2.6.32.12-7-bobdebug...|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|done
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32.12-7-bobdebug (geeko@buildhost) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #2 SMP Thu Jul 1 03:05:18 MDT 2010
EFI v2.10 by HP: SALsystab=0x2ffffca18 ACPI 2.0=0x3b492014 HCDP=0x2ffff9698 SMBIOS=0x3b442000
booting generic kernel on platform dig_vtd
PCDP: v3 at 0x2ffff9698
Early serial console at I/O port 0x2320 (options '115200n8')
bootconsole [uart8250] enabled
ACPI: RSDP 000000003b492014 00024 (v02 HP    )
ACPI: XSDT 000000003b492580 000F4 (v01 HP     RX2800-2 00000001      01000013)
ACPI: FACP 000000003b48c000 000F4 (v03 HP     RX2800-2 00000001 HP   00000001)
ACPI Warning: 32/64X length mismatch in Pm1aEventBlock: 32/16 (20090903/tbfadt-526)
ACPI Warning: 32/64X length mismatch in Gpe0Block: 128/64 (20090903/tbfadt-526)
ACPI Warning: Invalid length for Pm1aEventBlock: 16, using default 32 (20090903/tbfadt-607)
ACPI Warning: Invalid length for Pm1aControlBlock: 32, using default 16 (20090903/tbfadt-607)
ACPI: DSDT 000000003b474000 09C40 (v02 HP     RX2800-2 00000008 INTL 20061109)
ACPI: FACS 000000003b48e000 00040
ACPI: APIC 000000003b490000 0010C (v01 HP     RX2800-2 00000001 HP   00000001)
ACPI: SPCR 000000003b48a000 00050 (v01 HP     RX2800-2 00000001 HP   00000001)
ACPI: SRAT 000000003b488000 001F8 (v02 HP     RX2800-2 00000001 HP   00000001)
ACPI: SLIT 000000003b486000 00035 (v01 HP     RX2800-2 00000001 HP   00000001)
ACPI: CPEP 000000003b484000 00034 (v01 HP     RX2800-2 00000001 HP   00000001)
ACPI: SPMI 000000003b482000 00041 (v05 HP     RX2800-2 00000001 HP   00000001)
ACPI: HPET 000000003b480000 00038 (v01 HP     RX2800-2 00000001 HP   00000001)
ACPI: DMAR 000000003b47e000 0009C (v01 HP     RX2800-2 00000001 HP   00000001)
ACPI: SSDT 000000003b472000 000E2 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b470000 00030 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b46e000 014E9 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b46c000 00092 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b46a000 00092 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b468000 00092 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b466000 000E6 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b464000 00035 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b462000 00080 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b460000 00547 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b45e000 003C6 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b45c000 003C6 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b45a000 003C6 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b458000 000A4 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b456000 00276 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b454000 000BE (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: SSDT 000000003b452000 00036 (v02 HP     RX2800-2 00000007 INTL 20061109)
ACPI: Local APIC address c0000000fee00000
Number of logical nodes in system = 2
Number of memory chunks in system = 4
SMP: Allowing 16 CPUs, 8 hotplug CPUs
Reserving 256MB of memory at 128MB for crashkernel (System RAM: 8071MB)
Initial ramdisk at: 0xe0000002f15ff000 (11478392 bytes)
SAL 3.20: HP Kauai version 3.1
SAL: AP wakeup using external interrupt vector 0xf0
ACPI: Local APIC address c0000000fee00000
GSI 20 (level, low) -> CPU 0 (0x0400) vector 48
8 CPUs available, 16 CPUs total
MCA related initialization done
Virtual mem_map starts at 0xa07fffffff580000
Zone PFN ranges:
  DMA      0x00000100 -> 0x00010000
  Normal   0x00010000 -> 0x00030000
Movable zone start PFN for each node
early_node_map[48] active PFN ranges
    0: 0x00000100 -> 0x00003b44
    0: 0x00003b4a -> 0x00003b64
    0: 0x00003c94 -> 0x00003cc3
    0: 0x00003cc6 -> 0x00003cf5
    0: 0x00003cf8 -> 0x00003d28
    0: 0x00003d2b -> 0x00003d5a
    0: 0x00003d6d -> 0x00003d9c
    0: 0x00003da0 -> 0x00003dcf
    0: 0x00003dd2 -> 0x00003e01
    0: 0x00003e04 -> 0x00003e33
    0: 0x00003e47 -> 0x00003e76
    0: 0x00003e79 -> 0x00003ea8
    0: 0x00003eab -> 0x00003edb
    0: 0x00003eee -> 0x00003f1d
    0: 0x00003f20 -> 0x00003f4f
    0: 0x00003f53 -> 0x00003f82
    0: 0x00003f85 -> 0x00003fb4
    0: 0x00003fc7 -> 0x00003ff8
    0: 0x00003ffa -> 0x00003fff
    0: 0x00014000 -> 0x0001ffff
    1: 0x00020000 -> 0x0002f2ae
    1: 0x0002f6af -> 0x0002f6b4
    1: 0x0002f6b5 -> 0x0002f6b9
    1: 0x0002f6c3 -> 0x0002f6cb
    1: 0x0002f6d5 -> 0x0002f6dd
    1: 0x0002f6e7 -> 0x0002f6ef
    1: 0x0002f6f9 -> 0x0002f719
    1: 0x0002f71a -> 0x0002f71b
    1: 0x0002f723 -> 0x0002f72a
    1: 0x0002f72b -> 0x0002f7a5
    1: 0x0002f7a7 -> 0x0002f7a8
    1: 0x0002f7aa -> 0x0002f7b9
    1: 0x0002f7bd -> 0x0002f7c4
    1: 0x0002f80a -> 0x0002f80b
    1: 0x0002f80f -> 0x0002f817
    1: 0x0002f818 -> 0x0002fd37
    1: 0x0002fd9a -> 0x0002fd9d
    1: 0x0002fda3 -> 0x0002fda4
    1: 0x0002fdab -> 0x0002fdfb
    1: 0x0002fdfd -> 0x0002fdfe
    1: 0x0002fe01 -> 0x0002fe04
    1: 0x0002fe07 -> 0x0002fe0c
    1: 0x0002fe0d -> 0x0002fe12
    1: 0x0002fe13 -> 0x0002fe1a
    1: 0x0002fe1b -> 0x0002fe28
    1: 0x0002fe29 -> 0x0002fe5b
    1: 0x0002fe5c -> 0x0002fe8c
    1: 0x0002fe8d -> 0x0002fffe
Built 2 zonelists in Zone order, mobility grouping on.  Total pages: 128935
Policy zone: Normal
Kernel command line: BOOT_IMAGE=scsi0:\efi\SuSE\vmlinuz-2.6.32.12-7-bobdebug root=/dev/disk/by-id/cciss-3600508b1001037383941424344453500-part4  splash=silent crashkernel=512M-:256M 
PID hash table entries: 4096 (order: -1, 32768 bytes)
allocated 7853960 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Memory: 7910784k/7929152k available (7782k code, 351936k reserved, 10583k data, 2176k init)
Hierarchical RCU implementation.
NR_IRQS:1024
Console: colour VGA+ 80x25
Calibrating delay loop... 3186.68 BogoMIPS (lpj=6373376)
pid_max: default: 32768 minimum: 301
kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights Reserved
kdb_cmd[0]: defcmd archkdb "" "First line arch debugging"
kdb_cmd[8]: defcmd archkdbcpu "" "archkdb with only tasks on cpus"
kdb_cmd[15]: defcmd archkdbshort "" "archkdb with less detailed backtrace"
kdb_cmd[22]: defcmd archkdbcommon "" "Common arch debugging"
Security Framework initialized
AppArmor: AppArmor initialized
Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
ACPI: Core revision 20090903
Boot processor id 0x0/0x400
Fixed BSP b0 value from CPU 1
CPU 1: synchronized ITC with CPU 0 (last diff 0 cycles, maxerr 92 cycles)
CPU 2: synchronized ITC with CPU 0 (last diff -3 cycles, maxerr 619 cycles)
CPU 3: synchronized ITC with CPU 0 (last diff 1 cycles, maxerr 616 cycles)
CPU 4: synchronized ITC with CPU 0 (last diff 1 cycles, maxerr 748 cycles)
CPU 5: synchronized ITC with CPU 0 (last diff -13 cycles, maxerr 758 cycles)
CPU 6: synchronized ITC with CPU 0 (last diff -10 cycles, maxerr 760 cycles)
CPU 7: synchronized ITC with CPU 0 (last diff -21 cycles, maxerr 749 cycles)
Brought up 8 CPUs
Total of 8 processors activated (25493.50 BogoMIPS).
DMI 2.4 present.
NET: Registered protocol family 16
ACPI: bus type pci registered
bio: create slab <bio-0> at 0
version 2.82
CPU hotplug support enabled
ACPI: Interpreter enabled
ACPI: (supports S0 S5)
ACPI: Using IOSAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [RCX0] (0000:00)
pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
pci 0000:00:01.0: PME# disabled
pci 0000:00:07.0: PME# supported from D0 D3hot D3cold
pci 0000:00:07.0: PME# disabled
pci 0000:00:08.0: PME# supported from D0 D3hot D3cold
pci 0000:00:08.0: PME# disabled
pci 0000:00:09.0: PME# supported from D0 D3hot D3cold
pci 0000:00:09.0: PME# disabled
pci 0000:00:13.0: PME# supported from D0 D3hot D3cold
pci 0000:00:13.0: PME# disabled
pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1a.7: PME# disabled
pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.0: PME# disabled
pci 0000:00:1c.2: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.2: PME# disabled
pci 0000:00:1c.5: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.5: PME# disabled
pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1d.7: PME# disabled
pci 0000:00:1f.2: PME# supported from D3hot
pci 0000:00:1f.2: PME# disabled
pci 0000:01:00.0: PME# supported from D0
pci 0000:01:00.0: PME# disabled
pci 0000:00:01.0: PCI bridge to [bus 01-01]
pci 0000:00:07.0: PCI bridge to [bus 02-02]
pci 0000:00:08.0: PCI bridge to [bus 03-03]
pci 0000:00:09.0: PCI bridge to [bus 04-04]
pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
pci 0000:05:00.0: PME# disabled
pci 0000:05:00.1: PME# supported from D0 D3hot D3cold
pci 0000:05:00.1: PME# disabled
pci 0000:00:1c.0: PCI bridge to [bus 05-05]
pci 0000:06:00.0: PME# supported from D0 D3hot D3cold
pci 0000:06:00.0: PME# disabled
pci 0000:06:00.1: PME# supported from D0 D3hot D3cold
pci 0000:06:00.1: PME# disabled
pci 0000:00:1c.2: PCI bridge to [bus 06-06]
pci 0000:07:00.2: PME# supported from D0 D3hot D3cold
pci 0000:07:00.2: PME# disabled
pci 0000:00:1c.5: PCI bridge to [bus 07-07]
pci 0000:00:1e.0: PCI bridge to [bus 08-08] (subtractive decode)
pci 0000:08:03.0: BAR 6: address space collision on of device [0x58500000-0x5851ffff]
vgaarb: device added: PCI:0000:08:03.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
DMAR: Host address width 46
DMAR: DRHD base: 0x000000ef904000 flags: 0x0
IOMMU ef904000: ver 1:0 cap c90780106f0462 ecap f020fe
DMAR: ATSR flags: 0x0
DMAR: RHSA base: 0x000000ef904000 proximity domain: 0x2
DMAR: No RMRR found
IOMMU 0xef904000: using Queued invalidation
IOMMU: Setting RMRR:
PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
Switching to clocksource itc
AppArmor: AppArmor Filesystem Enabled
pnp: PnP ACPI init
ACPI: bus type pnp registered
GSI 3 (level, low) -> CPU 1 (0x0500) vector 50
GSI 17 (level, low) -> CPU 2 (0x0600) vector 51
GSI 2 (level, low) -> CPU 3 (0x0700) vector 52
GSI 8 (level, low) -> CPU 4 (0x1400) vector 53
GSI 11 (level, low) -> CPU 5 (0x1500) vector 54
GSI 12 (level, low) -> CPU 6 (0x1600) vector 55
pnp: PnP ACPI: found 6 devices
ACPI: ACPI bus type pnp unregistered
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 3, 524288 bytes)
TCP established hash table entries: 262144 (order: 6, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 4, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
NET: Registered protocol family 1
pci 0000:05:00.0: Disabling L0s
pci 0000:05:00.1: Disabling L0s
pci 0000:06:00.0: Disabling L0s
pci 0000:06:00.1: Disabling L0s
Unpacking initramfs...
Freeing initrd memory: 11200kB freed
PAL Information Facility v0.5
Please use IA-32 EL for executing IA-32 binaries
added sampling format default-old
audit: initializing netlink socket (disabled)
type=2000 audit(1277976355.651:1): initialized
HugeTLB registered 256 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 8192 (order 0, 65536 bytes)
msgmni has been set to 3868
alg: No test for stdrng (krng)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
added sampling format default
hpet0: at MMIO 0xfed00000, IRQs 52, 53, 54, 55
hpet0: 4 comparators, 64-bit 14.318180 MHz counter
EFI Time Services Driver v0.4
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
00:03: ttyS0 at I/O 0x2f8 (irq = 50) is a 16550A
serial 0000:07:00.5: PCI INT A -> GSI 17 (level, low) -> IRQ 51
0000:07:00.5: ttyS1 at I/O 0x2320 (irq = 51) is a 16550A
console [ttyS1] enabled, bootconsole disabled
console [ttyS1] enabled, bootconsole disabled
mice: PS/2 mouse device common for all mice
EFI Variables Facility v0.08 2004-May-17
TCP cubic registered
registered taskstats version 1
Freeing unused kernel memory: 2176kB freed
doing fast boot
SCSI subsystem initialized
HP CISS Driver (v 3.6.20)
GSI 28 (level, low) -> CPU 7 (0x1700) vector 63
cciss 0000:01:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 63
IRQ 66/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
cciss0: <0x323a> at PCI 0000:01:00.0 IRQ 66 using DAC
Entered OS MCA handler. PSP=20000800fff29320 cpu=6 monarch=1
<6>Entered OS MCA handler. PSP=20000800fff29320 cpu=3 monarch=0
<6>Entered OS MCA handler. PSP=a0010800fff29330 cpu=0 monarch=0
<6>Entered OS MCA handler. PSP=20000800fff29320 cpu=7 monarch=0
<6>Entered OS MCA handler. PSP=a0010800fff29330 cpu=2 monarch=0
<6>Entered OS MCA handler. PSP=a0010800fff29330 cpu=0 monarch=0
<6>Entered OS MCA handler. PSP=20000800fff29320 cpu=0 monarch=0
<6>Entered OS MCA handler. PSP=20000800fff29320 cpu=0 monarch=0
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
OS MCA slave did not rendezvous on cpu 1 4 5
MCA: error not contained
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...
MCA: error not contained
MCA: error not contained
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...
MCA: error not contained
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...
MCA: error not contained
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...
MCA: error not contained
MCA: error not contained
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...
MCA: error not contained
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
Delaying for 5 seconds...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cciss: WARNING/BUG in do_cciss_intr (it's back)
@ 2010-07-01 14:28 scameron
  0 siblings, 0 replies; 7+ messages in thread
From: scameron @ 2010-07-01 14:28 UTC (permalink / raw)
  To: zhanglinbao
  Cc: randy.dunlap, bob_zhang2004, axboe, mike.miller, iss_storagedev,
	linux-scsi, linux-kernel, james.bottomley, scameron, mikem

Bob Zhang wrote:

> Hi all,
> 
> I want to know the final result.
> have you fixed this bug ?  if yes, how to fix ?
> Now , I am using 2.6.32.12-7 from sles11SP1(ia64) , I still happened
> this problem.
> 
> 
> Any comments are welcome .
> 
> another point ,
> >> Randy,
> >> I think this is a different bug than the one you reported previously.
> >> Please open a new bugzilla.
> >
> > I think it's the same one. The first warning that now triggers is:
> >
> Could you give me the previous one link ?
> 
> attachment is the booting information and eror.

( See: http://lkml.org/lkml/2009/2/4/342 for a bit more context )

and Jens Axboe wrote, back in Feb of 2009:

> I think it's the same one. The first warning that now triggers is:
> 
> WARNING: at drivers/block/cciss.c:225 
> 
> which is
> 
>         if (WARN_ON(hlist_unhashed(&c->list)))
> removeQ(), this is where we would have crashed before due to trying to
> remove a command from a list it didn't belong to. And then we crash
> right after in the interrupt handler. So I'm pretty sure this is 100%
> the same bug.
> 

I did not see a similar error in the log file you provided.

The above problem appeared to be triggered by the reset_devices path (e.g. kdump) picking
up completions from the previous kernel, due to the device not actually being reset.
All the Smart arrays since the p600 can't be reset by the PCI power management
method.  Some of them can be reset by using the "doorbell" register, and a patch
for hpsa to do this has been implemented, this one:

http://marc.info/?l=linux-scsi&m=127671403229420&w=2

which is one patch in a series of other patches to hpsa.

I am currently working on a similar series of patches for cciss.

However, this won't help the P400, P400i, E500, P800, and P700m, which cannot
be reset by either method.  Also, the 6402 and 6404, while they can
be reset, it's inadvisable since they share a battery backed cache 
module, hence this patch to hpsa:

http://marc.info/?l=linux-scsi&m=127671403029407&w=2

See also: https://bugzilla.redhat.com/show_bug.cgi?id=609522
and https://bugzilla.redhat.com/show_bug.cgi?id=598681
(you need an account to see those, I think.)

-- steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-01 14:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-04 17:45 cciss: WARNING/BUG in do_cciss_intr (it's back) Randy Dunlap
2009-02-05 18:16 ` Miller, Mike (OS Dev)
2009-02-06  7:44   ` Jens Axboe
2009-02-06 16:09     ` Miller, Mike (OS Dev)
2009-02-06 16:34     ` Randy Dunlap
2010-07-01 10:18       ` Bob Zhang
  -- strict thread matches above, loose matches on Subject: below --
2010-07-01 14:28 scameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).