From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757287AbZBFQem (ORCPT ); Fri, 6 Feb 2009 11:34:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760011AbZBFQeL (ORCPT ); Fri, 6 Feb 2009 11:34:11 -0500 Received: from rcsinet12.oracle.com ([148.87.113.124]:44547 "EHLO rgminet12.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759868AbZBFQeJ (ORCPT ); Fri, 6 Feb 2009 11:34:09 -0500 Message-ID: <498C667D.3080702@oracle.com> Date: Fri, 06 Feb 2009 08:34:05 -0800 From: Randy Dunlap Organization: Oracle Linux Engineering User-Agent: Thunderbird 2.0.0.6 (X11/20070801) MIME-Version: 1.0 To: Jens Axboe CC: "Miller, Mike (OS Dev)" , ISS StorageDev , scsi , Linux Kernel Mailing List , James Bottomley Subject: Re: cciss: WARNING/BUG in do_cciss_intr (it's back) References: <4989D435.7010003@oracle.com> <0F5B06BAB751E047AB5C87D1F77A778859F9EAD9D5@GVW0547EXC.americas.hpqcorp.net> <20090206074427.GR30821@kernel.dk> In-Reply-To: <20090206074427.GR30821@kernel.dk> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsmt702.oracle.com [141.146.40.80] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090207.498C6675.011B:SCFSTAT928724,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jens Axboe wrote: > On Thu, Feb 05 2009, Miller, Mike (OS Dev) wrote: >> >> >>> -----Original Message----- >>> From: Randy Dunlap [mailto:randy.dunlap@oracle.com] >>> Sent: Wednesday, February 04, 2009 11:45 AM >>> To: Miller, Mike (OS Dev); ISS StorageDev; scsi; Linux Kernel >>> Mailing List >>> Subject: cciss: WARNING/BUG in do_cciss_intr (it's back) >>> >>> Hi Mike, >>> >>> Was there any debugging code added to try to help with this problem? >>> or is that the WARNING before the BUG? >>> >> Randy, >> I think this is a different bug than the one you reported previously. >> Please open a new bugzilla. > > I think it's the same one. The first warning that now triggers is: > > WARNING: at drivers/block/cciss.c:225 > > which is > > if (WARN_ON(hlist_unhashed(&c->list))) > > removeQ(), this is where we would have crashed before due to trying to > remove a command from a list it didn't belong to. And then we crash > right after in the interrupt handler. So I'm pretty sure this is 100% > the same bug. I agree, looks like the same bug to me also. > Randy, is this still using kexec? Perhaps cciss needs a better > kick-in-the-pants reset on driver load to clear EVERYTHING, there's > clearly something very bad happening there. Yes, still using kexec for loading/testing the new kernel. > >> Thanks, >> -- mikem >> >>> Booting 2.6.29-rc3-git6 oopsed with: >>> >>> calling cciss_init+0x0/0x2e [cciss] @ 733 HP CISS Driver (v 3.6.20) >>> ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54 cciss >>> 0000:42:08.0: PCI INT A -> Link[LNKA] -> GSI 54 (level, high) >>> -> IRQ 54 cciss 0000:42:08.0: irq 56 for MSI/MSI-X IRQ >>> 56/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs >>> cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 56 using DAC >>> ------------[ cut here ]------------ >>> WARNING: at drivers/block/cciss.c:225 >>> do_cciss_intr+0x58f/0x99a [cciss]() Hardware name: ProLiant >>> BL685c G1 Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd >>> Pid: 0, comm: swapper Not tainted 2.6.29-rc3-git6 #1 Call Trace: >>> [] warn_slowpath+0xd3/0xf2 >>> [] ? __mod_timer+0xc1/0xd3 >>> [] ? smi_timeout+0xd9/0xe5 >>> [] ? ktime_get_ts+0x49/0x4e >>> [] ? smi_timeout+0x0/0xe5 >>> [] do_cciss_intr+0x58f/0x99a [cciss] >>> [] handle_IRQ_event+0x27/0x57 >>> [] handle_edge_irq+0xde/0x11f >>> [] do_IRQ+0xdc/0x152 [] >>> ret_from_intr+0x0/0xa <4>---[ end trace a8b437cd48391e28 ]--- >>> BUG: unable to handle kernel NULL pointer dereference at >>> 00000000000000f4 >>> IP: [] do_cciss_intr+0x5d7/0x99a [cciss] PGD 0 >>> Oops: 0002 [#1] SMP >>> last sysfs file: /sys/block/ram15/dev >>> CPU 2 >>> Modules linked in: cciss(+) ehci_hcd ohci_hcd uhci_hcd >>> Pid: 0, comm: swapper Tainted: G W 2.6.29-rc3-git6 #1 >>> RIP: 0010:[] [] >>> do_cciss_intr+0x5d7/0x99a [cciss] >>> RSP: 0018:ffff88027f12fef0 EFLAGS: 00010046 >>> RAX: 0000000000000000 RBX: ffff88007f840270 RCX: 0000000000013888 >>> RDX: 0000000000008080 RSI: 0000000000000046 RDI: 0000000000000009 >>> RBP: ffff88027f12ff20 R08: 000000447f12fa70 R09: ffff88017e540700 >>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007f8404b0 >>> R13: ffff88027e1a0000 R14: 0000000000000000 R15: 0000000000000086 >>> FS: 0000000000680850(0000) GS:ffff88017f121380(0000) >>> knlGS:0000000000000000 >>> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >>> CR2: 00000000000000f4 CR3: 0000000000201000 CR4: 00000000000006e0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >>> 0000000000000400 Process swapper (pid: 0, threadinfo >>> ffff88017f164000, task ffff88017fa5d4c0) >>> Stack: >>> 0000000000000001 ffff88027f126280 0000000000000000 0000000000000000 >>> 0000000000000038 0000000000000000 ffff88027f12ff50 >>> ffffffff8026ed21 ffffffff8076e000 0000000000000038 >>> ffff88027f126280 ffffffff8076e054 Call Trace: >>> <0> [] handle_IRQ_event+0x27/0x57 >>> [] handle_edge_irq+0xde/0x11f >>> [] do_IRQ+0xdc/0x152 [] >>> ret_from_intr+0x0/0xa <0>Code: 50 08 48 c7 83 40 02 00 >>> 00 00 00 00 00 49 c7 44 24 08 00 00 00 00 8b 83 34 02 00 00 >>> 85 c0 0f 85 49 03 00 00 4c 8b b3 50 02 00 00 <41> c7 86 f4 00 >>> 00 00 00 00 00 00 4c 8b 83 28 02 00 00 66 41 8b RIP >>> [] do_cciss_intr+0x5d7/0x99a [cciss] RSP >>> >>> CR2: 00000000000000f4 >>> ---[ end trace a8b437cd48391e29 ]--- >>> Kernel panic - not syncing: Fatal exception in interrupt >>> >>> >>> >>> This is on an HP ProLiant BL685c G1, 4-proc system with >>> 8 GB of RAM. (same as previous reports) >>> >>> >>> Rebooting worked successfully. >>> >>> Thanks, >>> -- >>> ~Randy -- ~Randy