* 2.6.10-ac10 oops in journal_commit_transaction
@ 2005-03-03 13:45 Brice Figureau
2005-03-03 23:37 ` Andrew Morton
0 siblings, 1 reply; 7+ messages in thread
From: Brice Figureau @ 2005-03-03 13:45 UTC (permalink / raw)
To: linux-kernel
Hi,
I'm reporting an oops on a bi-Xeon database server under 2.6.10-ac10
quite similar to:
http://marc.theaimsgroup.com/?l=ext3-users&m=110848085314238&w=2
I also got another server crashing (a mail server this time), but I
couldn't get the oops/panic.
This was after more than two weeks of uptime, I was running 2.6.10-ac1
before and never got this problem.
Here are the oops information:
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
printing eip:
c01a858d
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables
CPU: 2
EIP: 0060:[journal_commit_transaction+877/5264] Not tainted VLI
EFLAGS: 00010286 (2.6.10-ac10)
EIP is at journal_commit_transaction+0x36d/0x1490
eax: db38a56c ebx: 00000000 ecx: 00000000 edx: f7779480
esi: f76fa000 edi: db38a56c ebp: f76fbf60 esp: f76fbdc8
ds: 007b es: 007b ss: 0068
Process kjournald (pid: 1206, threadinfo=f76fa000 task=f7454020)
Stack: f191fadc f191fadc 00000008 00000aa2 f76fbe04 f7fea4c0 f7c305b0 00000000
f77794b8 f7fea414 00000000 00000000 00000000 00000000 00000000 db313efc
f7779480 e4079c2c 00000aa2 00000001 f76fbe28 c01239b0 00000001 f76fbea8
Call Trace:
[show_stack+127/160] show_stack+0x7f/0xa0
[show_registers+351/464] show_registers+0x15f/0x1d0
[die+256/400] die+0x100/0x190
[do_page_fault+672/1712] do_page_fault+0x2a0/0x6b0
[error_code+43/48] error_code+0x2b/0x30
[kjournald+212/576] kjournald+0xd4/0x240
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Code: 8b 85 a0 fe ff ff 85 c0 0f 85 4f 0e 00 00 8b 95 a8 fe ff ff 8b 42 18 85 c0 0f 84 85 00 00 00 be 00 e0 ff ff 21 e6 8b 78 20 8b 1f <f0> ff 43 0c 8b 03 a8 04 0f 85 de 0d 00 00 89 5c 24 04 8b 4d 08
<6>note: kjournald[1206] exited with preempt_count 1
The code crashes in fs/jbd/commit.c journal_commit_transaction in this
particular area at line 314:
...
/*
* Wait for all previously submitted IO to complete.
*/
while (commit_transaction->t_locked_list) {
struct buffer_head *bh;
jh = commit_transaction->t_locked_list->b_tprev;
bh = jh2bh(jh);
get_bh(bh); <--- crash here because bh is NULL
if (buffer_locked(bh)) {
spin_unlock(&journal->j_list_lock);
wait_on_buffer(bh);
if (unlikely(!buffer_uptodate(bh)))
err = -EIO;
spin_lock(&journal->j_list_lock);
}
if (!inverted_lock(journal, bh)) {
put_bh(bh);
spin_lock(&journal->j_list_lock);
continue;
}
if (buffer_jbd(bh) && jh->b_jlist == BJ_Locked) {
__journal_unfile_buffer(jh);
jbd_unlock_bh_state(bh);
journal_remove_journal_head(bh);
put_bh(bh);
} else {
jbd_unlock_bh_state(bh);
}
put_bh(bh);
if (need_resched()) {
spin_unlock(&journal->j_list_lock);
cond_resched();
spin_lock(&journal->j_list_lock);
}
}
...
And more precisely at this stage of the code:
jh = commit_transaction->t_locked_list->b_tprev;
8b 78 20 mov 0x20(%eax),%edi
bh = jh2bh(jh);
8b 1f mov (%edi),%ebx
get_bh(bh);
f0 ff 43 0c lock incl 0xc(%ebx) <-- crash because ebx is null
8b 03 mov (%ebx),%eax
Unfortunately I don't have the knowledge (and time to aquire it) that
will help me chase down this bug/problem.
If you need more information (including .config and other) I'll be happy
to provide it.
Can you CC: me as I'm not subscribed to the list.
Regards,
--
Brice Figureau <brice+lklm@daysofwonder.com>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: 2.6.10-ac10 oops in journal_commit_transaction 2005-03-03 13:45 2.6.10-ac10 oops in journal_commit_transaction Brice Figureau @ 2005-03-03 23:37 ` Andrew Morton 2005-03-04 10:51 ` Brice Figureau 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2005-03-03 23:37 UTC (permalink / raw) To: Brice Figureau; +Cc: linux-kernel Brice Figureau <brice+lklm@daysofwonder.com> wrote: > > I'm reporting an oops on a bi-Xeon database server under 2.6.10-ac10 > quite similar to: > http://marc.theaimsgroup.com/?l=ext3-users&m=110848085314238&w=2 > > I also got another server crashing (a mail server this time), but I > couldn't get the oops/panic. > > This was after more than two weeks of uptime, I was running 2.6.10-ac1 > before and never got this problem. > > Here are the oops information: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000c > printing eip: > c01a858d > *pde = 00000000 > Oops: 0002 [#1] > PREEMPT SMP > Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables > CPU: 2 > EIP: 0060:[journal_commit_transaction+877/5264] Not tainted VLI > EFLAGS: 00010286 (2.6.10-ac10) > EIP is at journal_commit_transaction+0x36d/0x1490 Please do: gdb vmlinux (gdb) l *0xc01a858d ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.10-ac10 oops in journal_commit_transaction 2005-03-03 23:37 ` Andrew Morton @ 2005-03-04 10:51 ` Brice Figureau 2005-03-06 22:59 ` Alan Cox 0 siblings, 1 reply; 7+ messages in thread From: Brice Figureau @ 2005-03-04 10:51 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Hi Andrew, On Thu, 2005-03-03 at 15:37 -0800, Andrew Morton wrote: > Brice Figureau <brice+lklm@daysofwonder.com> wrote: [snip] > > Unable to handle kernel NULL pointer dereference at virtual address 0000000c > > printing eip: > > c01a858d > > *pde = 00000000 > > Oops: 0002 [#1] > > PREEMPT SMP > > Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables > > CPU: 2 > > EIP: 0060:[journal_commit_transaction+877/5264] Not tainted VLI > > EFLAGS: 00010286 (2.6.10-ac10) > > EIP is at journal_commit_transaction+0x36d/0x1490 > > Please do: > > gdb vmlinux > (gdb) l *0xc01a858d Unfortunately this kernel is not compiled with CONFIG_DEBUG_INFO=y, so the above command does not work. But: (gdb) disassemble 0xc01a858d Dump of assembler code for function journal_commit_transaction: [snipped] ... 0xc01a8568 <journal_commit_transaction+840>: test %eax,%eax 0xc01a856a <journal_commit_transaction+842>: jne 0xc01a93bf <journal_commit_transaction+4511> 0xc01a8570 <journal_commit_transaction+848>: mov 0xfffffea8(%ebp),%edx 0xc01a8576 <journal_commit_transaction+854>: mov 0x18(%edx),%eax 0xc01a8579 <journal_commit_transaction+857>: test %eax,%eax 0xc01a857b <journal_commit_transaction+859>: je 0xc01a8606 <journal_commit_transaction+998> 0xc01a8581 <journal_commit_transaction+865>: mov $0xffffe000,%esi 0xc01a8586 <journal_commit_transaction+870>: and %esp,%esi 0xc01a8588 <journal_commit_transaction+872>: mov 0x20(%eax),%edi 0xc01a858b <journal_commit_transaction+875>: mov (%edi),%ebx 0xc01a858d <journal_commit_transaction+877>: lock incl 0xc(%ebx) 0xc01a8591 <journal_commit_transaction+881>: mov (%ebx),%eax 0xc01a8593 <journal_commit_transaction+883>: test $0x4,%al 0xc01a8595 <journal_commit_transaction+885>: jne 0xc01a9379 <journal_commit_transaction+4441> 0xc01a859b <journal_commit_transaction+891>: mov %ebx,0x4(%esp) 0xc01a859f <journal_commit_transaction+895>: mov 0x8(%ebp),%ecx 0xc01a85a2 <journal_commit_transaction+898>: mov %ecx,(%esp) 0xc01a85a5 <journal_commit_transaction+901>: call 0xc01a81d0 <inverted_lock> 0xc01a85aa <journal_commit_transaction+906>: test %eax,%eax 0xc01a85ac <journal_commit_transaction+908>: je 0xc01a9373 <journal_commit_transaction+4435> 0xc01a85b2 <journal_commit_transaction+914>: mov (%ebx),%eax 0xc01a85b4 <journal_commit_transaction+916>: test $0x20,%ah So I recompiled my kernel with DEBUG_CONFIG_INFO with the hope that the code won't move too far and I could find the code: On the kernel with *debug* enabled: (gdb) l *0xc01a858d 0xc01a858d is in journal_commit_transaction (buffer_head.h:104). 99 * Emit the buffer bitops functions. Note that there are also functions 100 * of the form "mark_buffer_foo()". These are higher-level functions which 101 * do something in addition to setting a b_state bit. 102 */ 103 BUFFER_FNS(Uptodate, uptodate) 104 BUFFER_FNS(Dirty, dirty) 105 TAS_BUFFER_FNS(Dirty, dirty) 106 BUFFER_FNS(Lock, locked) 107 TAS_BUFFER_FNS(Lock, locked) 108 BUFFER_FNS(Req, req) Which does not seem to match the code included in the oops. (gdb) disassemble 0xc01a858d [snip] 0xc01a85c8 <journal_commit_transaction+840>: test %eax,%eax 0xc01a85ca <journal_commit_transaction+842>: jne 0xc01a941f <journal_commit_transaction+4511> 0xc01a85d0 <journal_commit_transaction+848>: mov 0xfffffea8(%ebp),%edx 0xc01a85d6 <journal_commit_transaction+854>: mov 0x18(%edx),%eax 0xc01a85d9 <journal_commit_transaction+857>: test %eax,%eax 0xc01a85db <journal_commit_transaction+859>: je 0xc01a8666 <journal_commit_transaction+998> 0xc01a85e1 <journal_commit_transaction+865>: mov $0xffffe000,%esi 0xc01a85e6 <journal_commit_transaction+870>: and %esp,%esi 0xc01a85e8 <journal_commit_transaction+872>: mov 0x20(%eax),%edi 0xc01a85eb <journal_commit_transaction+875>: mov (%edi),%ebx 0xc01a85ed <journal_commit_transaction+877>: lock incl 0xc(%ebx) 0xc01a85f1 <journal_commit_transaction+881>: mov (%ebx),%eax 0xc01a85f3 <journal_commit_transaction+883>: test $0x4,%al 0xc01a85f5 <journal_commit_transaction+885>: jne 0xc01a93d9 <journal_commit_transaction+4441> 0xc01a85fb <journal_commit_transaction+891>: mov %ebx,0x4(%esp) 0xc01a85ff <journal_commit_transaction+895>: mov 0x8(%ebp),%ecx 0xc01a8602 <journal_commit_transaction+898>: mov %ecx,(%esp) 0xc01a8605 <journal_commit_transaction+901>: call 0xc01a8230 <inverted_lock> 0xc01a860a <journal_commit_transaction+906>: test %eax,%eax 0xc01a860c <journal_commit_transaction+908>: je 0xc01a93d3 <journal_commit_transaction+4435> 0xc01a8612 <journal_commit_transaction+914>: mov (%ebx),%eax 0xc01a8614 <journal_commit_transaction+916>: test $0x20,%ah So the same code is now at 0xc01a85ed: (gdb) l *0xc01a85ed 0xc01a85ed is in journal_commit_transaction (atomic.h:103). 98 * 99 * Atomically increments @v by 1. 100 */ 101 static __inline__ void atomic_inc(atomic_t *v) 102 { 103 __asm__ __volatile__( 104 LOCK "incl %0" 105 :"=m" (v->counter) 106 :"m" (v->counter)); 107 } It seems to me that get_bh is the culprit because of the following definition from include/linux/buffer_head.h: static inline void get_bh(struct buffer_head *bh) { atomic_inc(&bh->b_count); } I hope this will help you. Let me know if you need more information. Thanks for taking care of that problem, Regards, -- Brice Figureau <brice+lklm@daysofwonder.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.10-ac10 oops in journal_commit_transaction 2005-03-04 10:51 ` Brice Figureau @ 2005-03-06 22:59 ` Alan Cox 0 siblings, 0 replies; 7+ messages in thread From: Alan Cox @ 2005-03-06 22:59 UTC (permalink / raw) To: Brice Figureau; +Cc: Andrew Morton, Linux Kernel Mailing List FYI Stephen Tweedie has now posted a patch for 2.6.x that ought to fix this one. Alan ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: 2.6.10-ac10 oops in journal_commit_transaction @ 2005-04-21 22:10 Zou, Nanhai 2005-04-21 22:29 ` Chris Wright 0 siblings, 1 reply; 7+ messages in thread From: Zou, Nanhai @ 2005-04-21 22:10 UTC (permalink / raw) To: Alan Cox, Brice Figureau; +Cc: Andrew Morton, Linux Kernel Mailing List Hi Alan, We have seen the same oops on the same point. Can you point to me the URL where the patch is? I am not sure which patch should I get. Thanks Zou Nan hai > -----Original Message----- > From: linux-kernel-owner@vger.kernel.org > [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Alan Cox > Sent: Monday, March 07, 2005 6:59 AM > To: Brice Figureau > Cc: Andrew Morton; Linux Kernel Mailing List > Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction > > FYI Stephen Tweedie has now posted a patch for 2.6.x that ought to fix > this one. > > Alan > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.10-ac10 oops in journal_commit_transaction 2005-04-21 22:10 Zou, Nanhai @ 2005-04-21 22:29 ` Chris Wright 2005-04-22 9:23 ` Alan Cox 0 siblings, 1 reply; 7+ messages in thread From: Chris Wright @ 2005-04-21 22:29 UTC (permalink / raw) To: Zou, Nanhai Cc: Alan Cox, Brice Figureau, Andrew Morton, Linux Kernel Mailing List * Zou, Nanhai (nanhai.zou@intel.com) wrote: > We have seen the same oops on the same point. > Can you point to me the URL where the patch is? > I am not sure which patch should I get. I believe it's fixed in 2.6.11-ac, and we fixed it in the current stable 2.6.11.7 tree. The following patch is what went into 2.6.11.7: --- From: Stephen Tweedie Subject: Prevent race condition in jbd This patch from Stephen Tweedie which fixes a race in jbd code (it demonstrated itself as more or less random NULL dereferences in the journal code). Acked-by: Jan Kara <jack@suse.cz> Acked-by: Chris Mason <mason@suse.com> Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> --- linux-2.6-ext3/fs/jbd/transaction.c.=K0000=.orig +++ linux-2.6-ext3/fs/jbd/transaction.c @@ -1775,10 +1775,10 @@ static int journal_unmap_buffer(journal_ JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget"); ret = __dispose_buffer(jh, journal->j_running_transaction); + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return ret; } else { /* There is no currently-running transaction. So the @@ -1789,10 +1789,10 @@ static int journal_unmap_buffer(journal_ JBUFFER_TRACE(jh, "give to committing trans"); ret = __dispose_buffer(jh, journal->j_committing_transaction); + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return ret; } else { /* The orphan record's transaction has @@ -1813,10 +1813,10 @@ static int journal_unmap_buffer(journal_ journal->j_running_transaction); jh->b_next_transaction = NULL; } + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return 0; } else { /* Good, the buffer belongs to the running transaction. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 2.6.10-ac10 oops in journal_commit_transaction 2005-04-21 22:29 ` Chris Wright @ 2005-04-22 9:23 ` Alan Cox 0 siblings, 0 replies; 7+ messages in thread From: Alan Cox @ 2005-04-22 9:23 UTC (permalink / raw) To: Chris Wright Cc: Zou, Nanhai, Brice Figureau, Andrew Morton, Linux Kernel Mailing List On Iau, 2005-04-21 at 23:29, Chris Wright wrote: > I believe it's fixed in 2.6.11-ac, and we fixed it in the current stable > 2.6.11.7 tree. The following patch is what went into 2.6.11.7: 2.6.11.7 or 2.6.11ac7 (ie 2.6.11.7-ac 8)) both have this fixed. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-04-22 10:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-03 13:45 2.6.10-ac10 oops in journal_commit_transaction Brice Figureau 2005-03-03 23:37 ` Andrew Morton 2005-03-04 10:51 ` Brice Figureau 2005-03-06 22:59 ` Alan Cox -- strict thread matches above, loose matches on Subject: below -- 2005-04-21 22:10 Zou, Nanhai 2005-04-21 22:29 ` Chris Wright 2005-04-22 9:23 ` Alan Cox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox