From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tyler Subject: Re: Failed assertion in the MegaRAID driver Date: Mon, 09 May 2005 16:07:03 -0700 Message-ID: <427FED17.9010906@dtbb.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: "J. Ryan Earl" Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.com List-Id: linux-raid.ids I would suggest checking/swapping your RAM out for some known good RAM, we were having RAID oopses that cleared up after replacing a bad stick of memory (or, possibly the fact that when we took it out, it was no longer trying to do dual-channel memory timing). Regards, Tyler. J. Ryan Earl wrote: >I'm having a problem on an Java application server under load. It's kernel >panicing, which prevents me from creating new sessions but I can check dmesg >with a sessions opened before the panic. It's happened a few times, >typically with over 1000 clients connected--ie some level of concurrency. >The last time I got an additional error after the megaraid problem, could >just be further failout from the first failure. Output follows: > >Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:138: >"journal->j_running_transaction != NULL" >------------[ cut here ]------------ >kernel BUG at fs/jbd/commit.c:138! >invalid operand: 0000 [#1] >SMP >Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button >battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd >dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod >CPU: 2 >EIP: 0060:[] Not tainted VLI >EFLAGS: 00010212 (2.6.9-5.0.5.ELsmp) >EIP is at journal_commit_transaction+0x5d/0xfb1 [jbd] >eax: 00000076 ebx: f7ec4e14 ecx: f74c5de0 edx: f88647de >esi: f7ec4e00 edi: 00000001 ebp: 00000000 esp: f74c5ddc >ds: 007b es: 007b ss: 0068 >Process kjournald (pid: 235, threadinfo=f74c5000 task=c2248330) >Stack: f88647de f8863e9c f88647ce 0000008a f88647a7 f61970b0 00000000 >00000000 > 00000000 00000000 00000000 c0771c8c f7ec4e00 f6a3c71c 0000100b >00000000 > c2248330 c011e8a2 f74c5e44 f74c5e44 f754a054 f8836f26 f74c5e44 >00000000 >Call Trace: > [] autoremove_wake_function+0x0/0x2d > [] megaraid_isr+0x1ad/0x1bf [megaraid_mbox] > [] autoremove_wake_function+0x0/0x2d > [] del_timer_sync+0x7a/0x9c > [] kjournald+0xc7/0x213 [jbd] > [] autoremove_wake_function+0x0/0x2d > [] autoremove_wake_function+0x0/0x2d > [] schedule_tail+0x12/0x55 > [] commit_timeout+0x0/0x5 [jbd] > [] kjournald+0x0/0x213 [jbd] > [] kernel_thread_helper+0x5/0xb >Code: 3b 00 00 8b 44 24 1c 83 78 38 00 75 29 68 a7 47 86 f8 68 8a 00 00 00 >68 ce 47 86 f8 68 9c 3e 86 f8 68 de 47 86 f8 e8 a2 18 8c c7 <0f> 0b 8a 00 ce >47 86 f8 83 c4 14 8b 54 24 1c 83 7a 3c 00 74 29 > <1>Unable to handle kernel NULL pointer dereference at virtual address >00000010 > printing eip: >f8b7aada >*pde = 35d53001 >Oops: 0000 [#2] >SMP >Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button >battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd >dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod >CPU: 0 >EIP: 0060:[] Not tainted VLI >EFLAGS: 00010a02 (2.6.9-5.0.5.ELsmp) >EIP is at is_valid_oplock_break+0xc8/0x19b [cifs] >eax: 00004ead ebx: 00000010 ecx: 0000ff00 edx: f8b91d14 >esi: c220d480 edi: d1299580 ebp: 00000037 esp: f7417f9c >ds: 007b es: 007b ss: 0068 >Process cifsd (pid: 2956, threadinfo=f7417000 task=f5c34d30) >Stack: c220c280 00000000 c220c2f0 f8b6fcbe f649ad00 d1299580 00000037 >d12995b7 > 00000000 f621c130 00000000 f7417fb8 00000001 00000000 00000000 >00000000 > 00000000 f8b6f79c 00000000 00000000 00000000 c01041f1 c220c280 >00000000 >Call Trace: > [] cifs_demultiplex_thread+0x522/0x782 [cifs] > [] cifs_demultiplex_thread+0x0/0x782 [cifs] > [] kernel_thread_helper+0x5/0xb >Code: 35 f0 1c b9 f8 8b 06 0f 18 00 90 81 fe f0 1c b9 f8 0f 84 c0 00 00 00 >0f b7 47 1c 66 39 86 84 00 00 00 0f 85 a8 00 00 00 8b 5e 08 <8b> 03 0f 18 00 >90 8d 46 08 39 c3 74 7e 0f b7 43 18 66 39 47 29 > > >/snip > >Now I'm wondering if this is more of a hardware problem, or a software >problem. I was running Gentoo with a 2.6.11.4 derived kernel on the same >box before switching to RHEL4, and was getting panics inside of ReiserFS, >which prompted the switch to RHEL4. My hardware vendor is trying to >replicate the problem now. I'm going to try replacing the RAID card, but >what else should I check? Anyone seen this problem before? > >Thanks in advance for any help, please respond directly to me as well as the >lists, > >J. Ryan Earl >Systems/Network Engineer >dynaConnections Corporation >512.306.9898 > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > >