Failed assertion in the MegaRAID driver

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Failed assertion in the MegaRAID driver
@ 2005-05-09 22:38 J. Ryan Earl
  2005-05-09 23:07 ` Tyler
  0 siblings, 1 reply; 2+ messages in thread
From: J. Ryan Earl @ 2005-05-09 22:38 UTC (permalink / raw)
  To: linux-raid, linux-kernel

I'm having a problem on an Java application server under load.  It's kernel
panicing, which prevents me from creating new sessions but I can check dmesg
with a sessions opened before the panic.  It's happened a few times,
typically with over 1000 clients connected--ie some level of concurrency.
The last time I got an additional error after the megaraid problem, could
just be further failout from the first failure.  Output follows:

Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:138:
"journal->j_running_transaction != NULL"
------------[ cut here ]------------
kernel BUG at fs/jbd/commit.c:138!
invalid operand: 0000 [#1]
SMP
Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button
battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd
dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
CPU:    2
EIP:    0060:[<f885f268>]    Not tainted VLI
EFLAGS: 00010212   (2.6.9-5.0.5.ELsmp)
EIP is at journal_commit_transaction+0x5d/0xfb1 [jbd]
eax: 00000076   ebx: f7ec4e14   ecx: f74c5de0   edx: f88647de
esi: f7ec4e00   edi: 00000001   ebp: 00000000   esp: f74c5ddc
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 235, threadinfo=f74c5000 task=c2248330)
Stack: f88647de f8863e9c f88647ce 0000008a f88647a7 f61970b0 00000000
00000000
       00000000 00000000 00000000 c0771c8c f7ec4e00 f6a3c71c 0000100b
00000000
       c2248330 c011e8a2 f74c5e44 f74c5e44 f754a054 f8836f26 f74c5e44
00000000
Call Trace:
 [<c011e8a2>] autoremove_wake_function+0x0/0x2d
 [<f8836f26>] megaraid_isr+0x1ad/0x1bf [megaraid_mbox]
 [<c011e8a2>] autoremove_wake_function+0x0/0x2d
 [<c0127dda>] del_timer_sync+0x7a/0x9c
 [<f8861e6d>] kjournald+0xc7/0x213 [jbd]
 [<c011e8a2>] autoremove_wake_function+0x0/0x2d
 [<c011e8a2>] autoremove_wake_function+0x0/0x2d
 [<c011bcf0>] schedule_tail+0x12/0x55
 [<f8861da0>] commit_timeout+0x0/0x5 [jbd]
 [<f8861da6>] kjournald+0x0/0x213 [jbd]
 [<c01041f1>] kernel_thread_helper+0x5/0xb
Code: 3b 00 00 8b 44 24 1c 83 78 38 00 75 29 68 a7 47 86 f8 68 8a 00 00 00
68 ce 47 86 f8 68 9c 3e 86 f8 68 de 47 86 f8 e8 a2 18 8c c7 <0f> 0b 8a 00 ce
47 86 f8 83 c4 14 8b 54 24 1c 83 7a 3c 00 74 29
 <1>Unable to handle kernel NULL pointer dereference at virtual address
00000010
 printing eip:
f8b7aada
*pde = 35d53001
Oops: 0000 [#2]
SMP
Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button
battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd
dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f8b7aada>]    Not tainted VLI
EFLAGS: 00010a02   (2.6.9-5.0.5.ELsmp)
EIP is at is_valid_oplock_break+0xc8/0x19b [cifs]
eax: 00004ead   ebx: 00000010   ecx: 0000ff00   edx: f8b91d14
esi: c220d480   edi: d1299580   ebp: 00000037   esp: f7417f9c
ds: 007b   es: 007b   ss: 0068
Process cifsd (pid: 2956, threadinfo=f7417000 task=f5c34d30)
Stack: c220c280 00000000 c220c2f0 f8b6fcbe f649ad00 d1299580 00000037
d12995b7
       00000000 f621c130 00000000 f7417fb8 00000001 00000000 00000000
00000000
       00000000 f8b6f79c 00000000 00000000 00000000 c01041f1 c220c280
00000000
Call Trace:
 [<f8b6fcbe>] cifs_demultiplex_thread+0x522/0x782 [cifs]
 [<f8b6f79c>] cifs_demultiplex_thread+0x0/0x782 [cifs]
 [<c01041f1>] kernel_thread_helper+0x5/0xb
Code: 35 f0 1c b9 f8 8b 06 0f 18 00 90 81 fe f0 1c b9 f8 0f 84 c0 00 00 00
0f b7 47 1c 66 39 86 84 00 00 00 0f 85 a8 00 00 00 8b 5e 08 <8b> 03 0f 18 00
90 8d 46 08 39 c3 74 7e 0f b7 43 18 66 39 47 29


/snip

Now I'm wondering if this is more of a hardware problem, or a software
problem.  I was running Gentoo with a 2.6.11.4 derived kernel on the same
box before switching to RHEL4, and was getting panics inside of ReiserFS,
which prompted the switch to RHEL4.  My hardware vendor is trying to
replicate the problem now.  I'm going to try replacing the RAID card, but
what else should I check?  Anyone seen this problem before?

Thanks in advance for any help, please respond directly to me as well as the
lists,

J. Ryan Earl
Systems/Network Engineer
dynaConnections Corporation
512.306.9898


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Failed assertion in the MegaRAID driver
  2005-05-09 22:38 Failed assertion in the MegaRAID driver J. Ryan Earl
@ 2005-05-09 23:07 ` Tyler
  0 siblings, 0 replies; 2+ messages in thread
From: Tyler @ 2005-05-09 23:07 UTC (permalink / raw)
  To: J. Ryan Earl; +Cc: linux-raid, linux-kernel

I would suggest checking/swapping your RAM out for some known good RAM, 
we were having RAID oopses that cleared up after replacing a bad stick 
of memory (or, possibly the fact that when we took it out, it was no 
longer trying to do dual-channel memory timing).

Regards,
Tyler.

J. Ryan Earl wrote:

>I'm having a problem on an Java application server under load.  It's kernel
>panicing, which prevents me from creating new sessions but I can check dmesg
>with a sessions opened before the panic.  It's happened a few times,
>typically with over 1000 clients connected--ie some level of concurrency.
>The last time I got an additional error after the megaraid problem, could
>just be further failout from the first failure.  Output follows:
>
>Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:138:
>"journal->j_running_transaction != NULL"
>------------[ cut here ]------------
>kernel BUG at fs/jbd/commit.c:138!
>invalid operand: 0000 [#1]
>SMP
>Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button
>battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd
>dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
>CPU:    2
>EIP:    0060:[<f885f268>]    Not tainted VLI
>EFLAGS: 00010212   (2.6.9-5.0.5.ELsmp)
>EIP is at journal_commit_transaction+0x5d/0xfb1 [jbd]
>eax: 00000076   ebx: f7ec4e14   ecx: f74c5de0   edx: f88647de
>esi: f7ec4e00   edi: 00000001   ebp: 00000000   esp: f74c5ddc
>ds: 007b   es: 007b   ss: 0068
>Process kjournald (pid: 235, threadinfo=f74c5000 task=c2248330)
>Stack: f88647de f8863e9c f88647ce 0000008a f88647a7 f61970b0 00000000
>00000000
>       00000000 00000000 00000000 c0771c8c f7ec4e00 f6a3c71c 0000100b
>00000000
>       c2248330 c011e8a2 f74c5e44 f74c5e44 f754a054 f8836f26 f74c5e44
>00000000
>Call Trace:
> [<c011e8a2>] autoremove_wake_function+0x0/0x2d
> [<f8836f26>] megaraid_isr+0x1ad/0x1bf [megaraid_mbox]
> [<c011e8a2>] autoremove_wake_function+0x0/0x2d
> [<c0127dda>] del_timer_sync+0x7a/0x9c
> [<f8861e6d>] kjournald+0xc7/0x213 [jbd]
> [<c011e8a2>] autoremove_wake_function+0x0/0x2d
> [<c011e8a2>] autoremove_wake_function+0x0/0x2d
> [<c011bcf0>] schedule_tail+0x12/0x55
> [<f8861da0>] commit_timeout+0x0/0x5 [jbd]
> [<f8861da6>] kjournald+0x0/0x213 [jbd]
> [<c01041f1>] kernel_thread_helper+0x5/0xb
>Code: 3b 00 00 8b 44 24 1c 83 78 38 00 75 29 68 a7 47 86 f8 68 8a 00 00 00
>68 ce 47 86 f8 68 9c 3e 86 f8 68 de 47 86 f8 e8 a2 18 8c c7 <0f> 0b 8a 00 ce
>47 86 f8 83 c4 14 8b 54 24 1c 83 7a 3c 00 74 29
> <1>Unable to handle kernel NULL pointer dereference at virtual address
>00000010
> printing eip:
>f8b7aada
>*pde = 35d53001
>Oops: 0000 [#2]
>SMP
>Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button
>battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd
>dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
>CPU:    0
>EIP:    0060:[<f8b7aada>]    Not tainted VLI
>EFLAGS: 00010a02   (2.6.9-5.0.5.ELsmp)
>EIP is at is_valid_oplock_break+0xc8/0x19b [cifs]
>eax: 00004ead   ebx: 00000010   ecx: 0000ff00   edx: f8b91d14
>esi: c220d480   edi: d1299580   ebp: 00000037   esp: f7417f9c
>ds: 007b   es: 007b   ss: 0068
>Process cifsd (pid: 2956, threadinfo=f7417000 task=f5c34d30)
>Stack: c220c280 00000000 c220c2f0 f8b6fcbe f649ad00 d1299580 00000037
>d12995b7
>       00000000 f621c130 00000000 f7417fb8 00000001 00000000 00000000
>00000000
>       00000000 f8b6f79c 00000000 00000000 00000000 c01041f1 c220c280
>00000000
>Call Trace:
> [<f8b6fcbe>] cifs_demultiplex_thread+0x522/0x782 [cifs]
> [<f8b6f79c>] cifs_demultiplex_thread+0x0/0x782 [cifs]
> [<c01041f1>] kernel_thread_helper+0x5/0xb
>Code: 35 f0 1c b9 f8 8b 06 0f 18 00 90 81 fe f0 1c b9 f8 0f 84 c0 00 00 00
>0f b7 47 1c 66 39 86 84 00 00 00 0f 85 a8 00 00 00 8b 5e 08 <8b> 03 0f 18 00
>90 8d 46 08 39 c3 74 7e 0f b7 43 18 66 39 47 29
>
>
>/snip
>
>Now I'm wondering if this is more of a hardware problem, or a software
>problem.  I was running Gentoo with a 2.6.11.4 derived kernel on the same
>box before switching to RHEL4, and was getting panics inside of ReiserFS,
>which prompted the switch to RHEL4.  My hardware vendor is trying to
>replicate the problem now.  I'm going to try replacing the RAID card, but
>what else should I check?  Anyone seen this problem before?
>
>Thanks in advance for any help, please respond directly to me as well as the
>lists,
>
>J. Ryan Earl
>Systems/Network Engineer
>dynaConnections Corporation
>512.306.9898
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>  
>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-05-09 23:07 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-09 22:38 Failed assertion in the MegaRAID driver J. Ryan Earl
2005-05-09 23:07 ` Tyler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).