* RE: Question about ext3 jbd module
@ 2006-07-25 9:59 Mao, Bibo
2006-07-27 13:30 ` [Ext2-devel] " Stephen C. Tweedie
0 siblings, 1 reply; 4+ messages in thread
From: Mao, Bibo @ 2006-07-25 9:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, ext2-devel
Yes, kernel version is 2.6.9, it is OS distribution kernel RHEL4. I run LTP stress test, which includes memory, file system etc stress test. And my machine has 4 physical IA64 CPU with dual core and hyptherthread function. I ever added printk information in the head of function journal_dirty_metadata(), jh value is NULL.
The LTP tool version is ltp-full-20060412, test case is testscripts/ltpstress.sh, and stress test will crash with/without patch in https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=158363,
Thanks
Bibo,mao
>-----Original Message-----
>From: Andrew Morton [mailto:akpm@osdl.org]
>Sent: 2006年7月25日 17:05
>To: Mao, Bibo
>Cc: linux-kernel@vger.kernel.org; ext2-devel@lists.sourceforge.net
>Subject: Re: Question about ext3 jbd module
>
>On Tue, 25 Jul 2006 08:06:02 +0000
>"bibo, mao" <bibo.mao@intel.com> wrote:
>
>> Hi,
>> When I run LTP stress test on my IA64 box based on ditribution
>> kernel version, kernel will crash within three days, I think this
>> problem should exist in recent kernel also, but I does not trigger
>> this. The problem is in function journal_dirty_metadata(),
>> there is one sentence like this:
>> struct journal_head *jh = bh2jh(bh);
>> >From my debug result, jh is NULL at this point, I do not know whether
>> it is because of contention or kernel forgets to consider NULL pointer
>> condition.
>>
>> I wrote one patch, after this patch LTP stress test does not crash, but
>> I am not familiar filesystem, I do not know whether it is the root cause
>> or what negative influence this patch will bring out.
>>
>> Thanks
>> bibo,mao
>>
>> --- linux-2.6.9/fs/jbd/transaction.c.orig 2006-06-30
>14:05:58.000000000 +0800
>> +++ linux-2.6.9/fs/jbd/transaction.c 2006-07-07 02:56:32.000000000 +0800
>> @@ -1104,13 +1104,15 @@ int journal_dirty_metadata(handle_t *han
>> {
>> transaction_t *transaction = handle->h_transaction;
>> journal_t *journal = transaction->t_journal;
>> - struct journal_head *jh = bh2jh(bh);
>> + struct journal_head *jh;
>>
>> - jbd_debug(5, "journal_head %p\n", jh);
>> - JBUFFER_TRACE(jh, "entry");
>> if (is_handle_aborted(handle))
>> goto out;
>>
>> + jh = journal_add_journal_head(bh);
>> + jbd_debug(5, "journal_head %p\n", jh);
>> + JBUFFER_TRACE(jh, "entry");
>> +
>> jbd_lock_bh_state(bh);
>>
>> /*
>> @@ -1154,6 +1156,7 @@ int journal_dirty_metadata(handle_t *han
>> spin_unlock(&journal->j_list_lock);
>> out_unlock_bh:
>> jbd_unlock_bh_state(bh);
>> + journal_put_journal_head(jh);
>> out:
>> JBUFFER_TRACE(jh, "exit");
>> return 0;
>>
>
>That's a worry. We've attached a journal_head and we've done
>do_get_write_access() and we're now proceeding to journal the buffer as
>metadata but someone has presumably gone and run
>__journal_try_to_free_buffer() against the thing and has stolen our
>journal_head.
>
>Simply reattaching a new journal_head is most likely wrong - we'll lose
>whatever state was supposed to be in the old one (like, which journal list
>this bh+jh is on).
>
>Somewhere, somehow, that journal_head has passed through a state which
>permitted __journal_try_to_free_buffer() to free it while appropriate locks
>were not held. I wonder where.
>
>Your diff headers claim to be against 2.6.9. Is that so?
>
>Would it be correct to assume that there was some page replacement pressure
>happening at the time?
>
>It looks like a big box - can you describe it a bit please?
>
>> Pid: 28417, CPU 13, comm: inode02
>> psr : 0000121008126010 ifs : 800000000000040d ip : [<a000000200134721>]
>Not tainted
>> ip is at journal_dirty_metadata+0x2c1/0x5e0 [jbd]
>> unat: 0000000000000000 pfs : 0000000000000917 rsc : 0000000000000003
>> rnat: 0000000000000000 bsps: 0000000000000000 pr : 005965a026595569
>> ldrs: 0000000000000000 ccv : 0000000000060011 fpsr: 0009804c8a70033f
>> csd : 0000000000000000 ssd : 0000000000000000
>> b0 : a0000002001d3350 b6 : a000000100589f20 b7 : a0000001001fee60
>> f6 : 1003e0000000000000000 f7 : 1003e0000000000000080
>> f8 : 1003e00000000000008c1 f9 : 1003effffffffffffc0a0
>> f10 : 100049c8d719c0533ddf0 f11 : 1003e00000000000008c1
>> r1 : a000000200330000 r2 : a000000200158630 r3 : a000000200158630
>> r8 : e0000001d885631c r9 : 0000000000000000 r10 : e0000001bede66e0
>> r11 : 0000000000000010 r12 : e0000001a8a67d40 r13 : e0000001a8a60000
>> r14 : 0000000000060011 r15 : 0000000000000000 r16 : e0000001fef76e80
>> r17 : e0000001e3e34b38 r18 : 0000000000000020 r19 : 0000000000060011
>> r20 : 00000000000e0011 r21 : 0000000000060011 r22 : 0000000000000000
>> r23 : 0000000000000000 r24 : 0000000000000000 r25 : e0000001e6cc8090
>> r26 : e0000001d88561e0 r27 : 0000000044bc2661 r28 : e0000001d8856078
>> r29 : 000000007c6fe61a r30 : e0000001e6cc80e8 r31 : e0000001d8856080
>>
>> Call Trace:
>> [<a000000100016b20>] show_stack+0x80/0xa0
>> sp=e0000001a8a67750 bsp=e0000001a8a61360
>> [<a000000100017430>] show_regs+0x890/0x8c0
>> sp=e0000001a8a67920 bsp=e0000001a8a61318
>> [<a00000010003dbf0>] die+0x150/0x240
>> sp=e0000001a8a67940 bsp=e0000001a8a612d8
>> [<a00000010003dd20>] die_if_kernel+0x40/0x60
>> sp=e0000001a8a67940 bsp=e0000001a8a612a8
>> [<a00000010003f930>] ia64_fault+0x1450/0x15a0
>> sp=e0000001a8a67940 bsp=e0000001a8a61250
>> [<a00000010000f540>] ia64_leave_kernel+0x0/0x260
>> sp=e0000001a8a67b70 bsp=e0000001a8a61250
>> [<a000000200134720>] journal_dirty_metadata+0x2c0/0x5e0 [jbd]
>> sp=e0000001a8a67d40 bsp=e0000001a8a611e0
>> [<a0000002001d3350>] ext3_mark_iloc_dirty+0x750/0xc00 [ext3]
>> sp=e0000001a8a67d40 bsp=e0000001a8a61150
>> [<a0000002001d3a90>] ext3_mark_inode_dirty+0xb0/0xe0 [ext3]
>> sp=e0000001a8a67d40 bsp=e0000001a8a61128
>> [<a0000002001ce3d0>] ext3_new_inode+0x1670/0x1a80 [ext3]
>> sp=e0000001a8a67d60 bsp=e0000001a8a61068
>> [<a0000002001df5e0>] ext3_mkdir+0x120/0x940 [ext3]
>> sp=e0000001a8a67da0 bsp=e0000001a8a61008
>> [<a000000100148250>] vfs_mkdir+0x250/0x380
>> sp=e0000001a8a67db0 bsp=e0000001a8a60fb0
>> [<a0000001001484f0>] sys_mkdir+0x170/0x280
>> sp=e0000001a8a67db0 bsp=e0000001a8a60f30
>> [<a00000010000f3e0>] ia64_ret_from_syscall+0x0/0x20
>>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [Ext2-devel] Question about ext3 jbd module
2006-07-25 9:59 Question about ext3 jbd module Mao, Bibo
@ 2006-07-27 13:30 ` Stephen C. Tweedie
2006-07-27 13:45 ` bibo,mao
0 siblings, 1 reply; 4+ messages in thread
From: Stephen C. Tweedie @ 2006-07-27 13:30 UTC (permalink / raw)
To: Mao, Bibo
Cc: Andrew Morton, ext2-devel@lists.sourceforge.net, linux-kernel,
Stephen Tweedie
Hi,
On Tue, 2006-07-25 at 17:59 +0800, Mao, Bibo wrote:
> Yes, kernel version is 2.6.9, it is OS distribution kernel RHEL4.
Which version? There were a few upstream problems like this fixed in
2.6.11 or so, and I think current RHEL-4 updates should include those.
I'm off on holiday/family wedding tomorrow for a couple of weeks, so
replies might be slow...
--Stephen
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Ext2-devel] Question about ext3 jbd module
2006-07-27 13:30 ` [Ext2-devel] " Stephen C. Tweedie
@ 2006-07-27 13:45 ` bibo,mao
2006-07-27 21:34 ` Jarod Wilson
0 siblings, 1 reply; 4+ messages in thread
From: bibo,mao @ 2006-07-27 13:45 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Andrew Morton, ext2-devel, linux-kernel
System crashed in RHEL4 U3 version, I doubt why there is no judgement
about whether jh is NULL or not in function journal_dirty_metadata().
thanks
bibo,mao
Stephen C. Tweedie wrote:
> Hi,
>
> On Tue, 2006-07-25 at 17:59 +0800, Mao, Bibo wrote:
> > Yes, kernel version is 2.6.9, it is OS distribution kernel RHEL4.
>
> Which version? There were a few upstream problems like this fixed in
> 2.6.11 or so, and I think current RHEL-4 updates should include those.
>
> I'm off on holiday/family wedding tomorrow for a couple of weeks, so
> replies might be slow...
>
> --Stephen
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Ext2-devel] Question about ext3 jbd module
2006-07-27 13:45 ` bibo,mao
@ 2006-07-27 21:34 ` Jarod Wilson
0 siblings, 0 replies; 4+ messages in thread
From: Jarod Wilson @ 2006-07-27 21:34 UTC (permalink / raw)
To: bibo,mao; +Cc: Stephen C. Tweedie, Andrew Morton, ext2-devel, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 865 bytes --]
On Thu, 2006-07-27 at 21:45 +0800, bibo,mao wrote:
> System crashed in RHEL4 U3 version, I doubt why there is no judgement
> about whether jh is NULL or not in function journal_dirty_metadata().
We're actually testing a patch we think may fix this problem right now,
assuming this is
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199667
> Stephen C. Tweedie wrote:
> > Hi,
> >
> > On Tue, 2006-07-25 at 17:59 +0800, Mao, Bibo wrote:
> > > Yes, kernel version is 2.6.9, it is OS distribution kernel RHEL4.
> >
> > Which version? There were a few upstream problems like this fixed in
> > 2.6.11 or so, and I think current RHEL-4 updates should include those.
> >
> > I'm off on holiday/family wedding tomorrow for a couple of weeks, so
> > replies might be slow...
> >
> > --Stephen
--
Jarod Wilson
jwilson@redhat.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-07-27 21:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-25 9:59 Question about ext3 jbd module Mao, Bibo
2006-07-27 13:30 ` [Ext2-devel] " Stephen C. Tweedie
2006-07-27 13:45 ` bibo,mao
2006-07-27 21:34 ` Jarod Wilson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox