* Failed assertion in the MegaRAID driver
@ 2005-05-09 22:38 J. Ryan Earl
2005-05-09 23:07 ` Tyler
0 siblings, 1 reply; 2+ messages in thread
From: J. Ryan Earl @ 2005-05-09 22:38 UTC (permalink / raw)
To: linux-raid, linux-kernel
I'm having a problem on an Java application server under load. It's kernel
panicing, which prevents me from creating new sessions but I can check dmesg
with a sessions opened before the panic. It's happened a few times,
typically with over 1000 clients connected--ie some level of concurrency.
The last time I got an additional error after the megaraid problem, could
just be further failout from the first failure. Output follows:
Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:138:
"journal->j_running_transaction != NULL"
------------[ cut here ]------------
kernel BUG at fs/jbd/commit.c:138!
invalid operand: 0000 [#1]
SMP
Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button
battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd
dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
CPU: 2
EIP: 0060:[<f885f268>] Not tainted VLI
EFLAGS: 00010212 (2.6.9-5.0.5.ELsmp)
EIP is at journal_commit_transaction+0x5d/0xfb1 [jbd]
eax: 00000076 ebx: f7ec4e14 ecx: f74c5de0 edx: f88647de
esi: f7ec4e00 edi: 00000001 ebp: 00000000 esp: f74c5ddc
ds: 007b es: 007b ss: 0068
Process kjournald (pid: 235, threadinfo=f74c5000 task=c2248330)
Stack: f88647de f8863e9c f88647ce 0000008a f88647a7 f61970b0 00000000
00000000
00000000 00000000 00000000 c0771c8c f7ec4e00 f6a3c71c 0000100b
00000000
c2248330 c011e8a2 f74c5e44 f74c5e44 f754a054 f8836f26 f74c5e44
00000000
Call Trace:
[<c011e8a2>] autoremove_wake_function+0x0/0x2d
[<f8836f26>] megaraid_isr+0x1ad/0x1bf [megaraid_mbox]
[<c011e8a2>] autoremove_wake_function+0x0/0x2d
[<c0127dda>] del_timer_sync+0x7a/0x9c
[<f8861e6d>] kjournald+0xc7/0x213 [jbd]
[<c011e8a2>] autoremove_wake_function+0x0/0x2d
[<c011e8a2>] autoremove_wake_function+0x0/0x2d
[<c011bcf0>] schedule_tail+0x12/0x55
[<f8861da0>] commit_timeout+0x0/0x5 [jbd]
[<f8861da6>] kjournald+0x0/0x213 [jbd]
[<c01041f1>] kernel_thread_helper+0x5/0xb
Code: 3b 00 00 8b 44 24 1c 83 78 38 00 75 29 68 a7 47 86 f8 68 8a 00 00 00
68 ce 47 86 f8 68 9c 3e 86 f8 68 de 47 86 f8 e8 a2 18 8c c7 <0f> 0b 8a 00 ce
47 86 f8 83 c4 14 8b 54 24 1c 83 7a 3c 00 74 29
<1>Unable to handle kernel NULL pointer dereference at virtual address
00000010
printing eip:
f8b7aada
*pde = 35d53001
Oops: 0000 [#2]
SMP
Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button
battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd
dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
CPU: 0
EIP: 0060:[<f8b7aada>] Not tainted VLI
EFLAGS: 00010a02 (2.6.9-5.0.5.ELsmp)
EIP is at is_valid_oplock_break+0xc8/0x19b [cifs]
eax: 00004ead ebx: 00000010 ecx: 0000ff00 edx: f8b91d14
esi: c220d480 edi: d1299580 ebp: 00000037 esp: f7417f9c
ds: 007b es: 007b ss: 0068
Process cifsd (pid: 2956, threadinfo=f7417000 task=f5c34d30)
Stack: c220c280 00000000 c220c2f0 f8b6fcbe f649ad00 d1299580 00000037
d12995b7
00000000 f621c130 00000000 f7417fb8 00000001 00000000 00000000
00000000
00000000 f8b6f79c 00000000 00000000 00000000 c01041f1 c220c280
00000000
Call Trace:
[<f8b6fcbe>] cifs_demultiplex_thread+0x522/0x782 [cifs]
[<f8b6f79c>] cifs_demultiplex_thread+0x0/0x782 [cifs]
[<c01041f1>] kernel_thread_helper+0x5/0xb
Code: 35 f0 1c b9 f8 8b 06 0f 18 00 90 81 fe f0 1c b9 f8 0f 84 c0 00 00 00
0f b7 47 1c 66 39 86 84 00 00 00 0f 85 a8 00 00 00 8b 5e 08 <8b> 03 0f 18 00
90 8d 46 08 39 c3 74 7e 0f b7 43 18 66 39 47 29
/snip
Now I'm wondering if this is more of a hardware problem, or a software
problem. I was running Gentoo with a 2.6.11.4 derived kernel on the same
box before switching to RHEL4, and was getting panics inside of ReiserFS,
which prompted the switch to RHEL4. My hardware vendor is trying to
replicate the problem now. I'm going to try replacing the RAID card, but
what else should I check? Anyone seen this problem before?
Thanks in advance for any help, please respond directly to me as well as the
lists,
J. Ryan Earl
Systems/Network Engineer
dynaConnections Corporation
512.306.9898
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: Failed assertion in the MegaRAID driver
2005-05-09 22:38 Failed assertion in the MegaRAID driver J. Ryan Earl
@ 2005-05-09 23:07 ` Tyler
0 siblings, 0 replies; 2+ messages in thread
From: Tyler @ 2005-05-09 23:07 UTC (permalink / raw)
To: J. Ryan Earl; +Cc: linux-raid, linux-kernel
I would suggest checking/swapping your RAM out for some known good RAM,
we were having RAID oopses that cleared up after replacing a bad stick
of memory (or, possibly the fact that when we took it out, it was no
longer trying to do dual-channel memory timing).
Regards,
Tyler.
J. Ryan Earl wrote:
>I'm having a problem on an Java application server under load. It's kernel
>panicing, which prevents me from creating new sessions but I can check dmesg
>with a sessions opened before the panic. It's happened a few times,
>typically with over 1000 clients connected--ie some level of concurrency.
>The last time I got an additional error after the megaraid problem, could
>just be further failout from the first failure. Output follows:
>
>Assertion failure in journal_commit_transaction() at fs/jbd/commit.c:138:
>"journal->j_running_transaction != NULL"
>------------[ cut here ]------------
>kernel BUG at fs/jbd/commit.c:138!
>invalid operand: 0000 [#1]
>SMP
>Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button
>battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd
>dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
>CPU: 2
>EIP: 0060:[<f885f268>] Not tainted VLI
>EFLAGS: 00010212 (2.6.9-5.0.5.ELsmp)
>EIP is at journal_commit_transaction+0x5d/0xfb1 [jbd]
>eax: 00000076 ebx: f7ec4e14 ecx: f74c5de0 edx: f88647de
>esi: f7ec4e00 edi: 00000001 ebp: 00000000 esp: f74c5ddc
>ds: 007b es: 007b ss: 0068
>Process kjournald (pid: 235, threadinfo=f74c5000 task=c2248330)
>Stack: f88647de f8863e9c f88647ce 0000008a f88647a7 f61970b0 00000000
>00000000
> 00000000 00000000 00000000 c0771c8c f7ec4e00 f6a3c71c 0000100b
>00000000
> c2248330 c011e8a2 f74c5e44 f74c5e44 f754a054 f8836f26 f74c5e44
>00000000
>Call Trace:
> [<c011e8a2>] autoremove_wake_function+0x0/0x2d
> [<f8836f26>] megaraid_isr+0x1ad/0x1bf [megaraid_mbox]
> [<c011e8a2>] autoremove_wake_function+0x0/0x2d
> [<c0127dda>] del_timer_sync+0x7a/0x9c
> [<f8861e6d>] kjournald+0xc7/0x213 [jbd]
> [<c011e8a2>] autoremove_wake_function+0x0/0x2d
> [<c011e8a2>] autoremove_wake_function+0x0/0x2d
> [<c011bcf0>] schedule_tail+0x12/0x55
> [<f8861da0>] commit_timeout+0x0/0x5 [jbd]
> [<f8861da6>] kjournald+0x0/0x213 [jbd]
> [<c01041f1>] kernel_thread_helper+0x5/0xb
>Code: 3b 00 00 8b 44 24 1c 83 78 38 00 75 29 68 a7 47 86 f8 68 8a 00 00 00
>68 ce 47 86 f8 68 9c 3e 86 f8 68 de 47 86 f8 e8 a2 18 8c c7 <0f> 0b 8a 00 ce
>47 86 f8 83 c4 14 8b 54 24 1c 83 7a 3c 00 74 29
> <1>Unable to handle kernel NULL pointer dereference at virtual address
>00000010
> printing eip:
>f8b7aada
>*pde = 35d53001
>Oops: 0000 [#2]
>SMP
>Modules linked in: nls_utf8 cifs nfs lockd md5 ipv6 autofs4 sunrpc button
>battery ac ohci_hcd tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd
>dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
>CPU: 0
>EIP: 0060:[<f8b7aada>] Not tainted VLI
>EFLAGS: 00010a02 (2.6.9-5.0.5.ELsmp)
>EIP is at is_valid_oplock_break+0xc8/0x19b [cifs]
>eax: 00004ead ebx: 00000010 ecx: 0000ff00 edx: f8b91d14
>esi: c220d480 edi: d1299580 ebp: 00000037 esp: f7417f9c
>ds: 007b es: 007b ss: 0068
>Process cifsd (pid: 2956, threadinfo=f7417000 task=f5c34d30)
>Stack: c220c280 00000000 c220c2f0 f8b6fcbe f649ad00 d1299580 00000037
>d12995b7
> 00000000 f621c130 00000000 f7417fb8 00000001 00000000 00000000
>00000000
> 00000000 f8b6f79c 00000000 00000000 00000000 c01041f1 c220c280
>00000000
>Call Trace:
> [<f8b6fcbe>] cifs_demultiplex_thread+0x522/0x782 [cifs]
> [<f8b6f79c>] cifs_demultiplex_thread+0x0/0x782 [cifs]
> [<c01041f1>] kernel_thread_helper+0x5/0xb
>Code: 35 f0 1c b9 f8 8b 06 0f 18 00 90 81 fe f0 1c b9 f8 0f 84 c0 00 00 00
>0f b7 47 1c 66 39 86 84 00 00 00 0f 85 a8 00 00 00 8b 5e 08 <8b> 03 0f 18 00
>90 8d 46 08 39 c3 74 7e 0f b7 43 18 66 39 47 29
>
>
>/snip
>
>Now I'm wondering if this is more of a hardware problem, or a software
>problem. I was running Gentoo with a 2.6.11.4 derived kernel on the same
>box before switching to RHEL4, and was getting panics inside of ReiserFS,
>which prompted the switch to RHEL4. My hardware vendor is trying to
>replicate the problem now. I'm going to try replacing the RAID card, but
>what else should I check? Anyone seen this problem before?
>
>Thanks in advance for any help, please respond directly to me as well as the
>lists,
>
>J. Ryan Earl
>Systems/Network Engineer
>dynaConnections Corporation
>512.306.9898
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2005-05-09 23:07 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-09 22:38 Failed assertion in the MegaRAID driver J. Ryan Earl
2005-05-09 23:07 ` Tyler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).