Question about ext3 jbd module

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Question about ext3 jbd module
@ 2006-07-25  8:06 bibo, mao
  2006-07-25  9:04 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: bibo, mao @ 2006-07-25  8:06 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel

Hi, 
   When I run LTP stress test on my IA64 box based on ditribution
kernel version, kernel will crash within three days, I think this 
problem should exist in recent kernel also, but I does not trigger
this. The problem is in function journal_dirty_metadata(),
there is one sentence like this: 
	struct journal_head *jh = bh2jh(bh);
>From my debug result, jh is NULL at this point, I do not know whether
it is because of contention or kernel forgets to consider NULL pointer
condition.

I wrote one patch, after this patch LTP stress test does not crash, but
I am not familiar filesystem, I do not know whether it is the root cause
or what negative influence this patch will bring out.

Thanks
bibo,mao

--- linux-2.6.9/fs/jbd/transaction.c.orig       2006-06-30 14:05:58.000000000 +0800
+++ linux-2.6.9/fs/jbd/transaction.c    2006-07-07 02:56:32.000000000 +0800
@@ -1104,13 +1104,15 @@ int journal_dirty_metadata(handle_t *han
 {
        transaction_t *transaction = handle->h_transaction;
        journal_t *journal = transaction->t_journal;
-       struct journal_head *jh = bh2jh(bh);
+       struct journal_head *jh;

-       jbd_debug(5, "journal_head %p\n", jh);
-       JBUFFER_TRACE(jh, "entry");
        if (is_handle_aborted(handle))
                goto out;

+       jh = journal_add_journal_head(bh);
+       jbd_debug(5, "journal_head %p\n", jh);
+       JBUFFER_TRACE(jh, "entry");
+
        jbd_lock_bh_state(bh);

        /*
@@ -1154,6 +1156,7 @@ int journal_dirty_metadata(handle_t *han
        spin_unlock(&journal->j_list_lock);
 out_unlock_bh:
        jbd_unlock_bh_state(bh);
+       journal_put_journal_head(jh);
 out:
        JBUFFER_TRACE(jh, "exit");
        return 0;


	
inode02[28417]: NaT consumption 2216203124768 [1]
Modules linked in: nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) sunrpc(U) ds(U) yenta_socket(U) pc
mcia_core(U) vfat(U) fat(U) dm_mirror(U) dm_mod(U) joydev(U) button(U) uhci_hcd(U) ehci_hcd(U) shpchp(U) e1000(U) ext3(U) jbd(U) mptscsih(U) mptsas(U) mptspi
(U) mptfc(U) mptscsi(U) mptbase(U) sd_mod(U) scsi_mod(U)

Pid: 28417, CPU 13, comm:              inode02
psr : 0000121008126010 ifs : 800000000000040d ip  : [<a000000200134721>]    Not tainted
ip is at journal_dirty_metadata+0x2c1/0x5e0 [jbd]
unat: 0000000000000000 pfs : 0000000000000917 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 005965a026595569
ldrs: 0000000000000000 ccv : 0000000000060011 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000002001d3350 b6  : a000000100589f20 b7  : a0000001001fee60
f6  : 1003e0000000000000000 f7  : 1003e0000000000000080
f8  : 1003e00000000000008c1 f9  : 1003effffffffffffc0a0
f10 : 100049c8d719c0533ddf0 f11 : 1003e00000000000008c1
r1  : a000000200330000 r2  : a000000200158630 r3  : a000000200158630
r8  : e0000001d885631c r9  : 0000000000000000 r10 : e0000001bede66e0
r11 : 0000000000000010 r12 : e0000001a8a67d40 r13 : e0000001a8a60000
r14 : 0000000000060011 r15 : 0000000000000000 r16 : e0000001fef76e80
r17 : e0000001e3e34b38 r18 : 0000000000000020 r19 : 0000000000060011
r20 : 00000000000e0011 r21 : 0000000000060011 r22 : 0000000000000000
r23 : 0000000000000000 r24 : 0000000000000000 r25 : e0000001e6cc8090
r26 : e0000001d88561e0 r27 : 0000000044bc2661 r28 : e0000001d8856078
r29 : 000000007c6fe61a r30 : e0000001e6cc80e8 r31 : e0000001d8856080

Call Trace:
 [<a000000100016b20>] show_stack+0x80/0xa0
                                sp=e0000001a8a67750 bsp=e0000001a8a61360
 [<a000000100017430>] show_regs+0x890/0x8c0
                                sp=e0000001a8a67920 bsp=e0000001a8a61318
 [<a00000010003dbf0>] die+0x150/0x240
                                sp=e0000001a8a67940 bsp=e0000001a8a612d8
 [<a00000010003dd20>] die_if_kernel+0x40/0x60
                                sp=e0000001a8a67940 bsp=e0000001a8a612a8
 [<a00000010003f930>] ia64_fault+0x1450/0x15a0
                                sp=e0000001a8a67940 bsp=e0000001a8a61250
 [<a00000010000f540>] ia64_leave_kernel+0x0/0x260
                                sp=e0000001a8a67b70 bsp=e0000001a8a61250
 [<a000000200134720>] journal_dirty_metadata+0x2c0/0x5e0 [jbd]
                                sp=e0000001a8a67d40 bsp=e0000001a8a611e0
 [<a0000002001d3350>] ext3_mark_iloc_dirty+0x750/0xc00 [ext3]
                                sp=e0000001a8a67d40 bsp=e0000001a8a61150
 [<a0000002001d3a90>] ext3_mark_inode_dirty+0xb0/0xe0 [ext3]
                                sp=e0000001a8a67d40 bsp=e0000001a8a61128
 [<a0000002001ce3d0>] ext3_new_inode+0x1670/0x1a80 [ext3]
                                sp=e0000001a8a67d60 bsp=e0000001a8a61068
 [<a0000002001df5e0>] ext3_mkdir+0x120/0x940 [ext3]
                                sp=e0000001a8a67da0 bsp=e0000001a8a61008
 [<a000000100148250>] vfs_mkdir+0x250/0x380
                                sp=e0000001a8a67db0 bsp=e0000001a8a60fb0
 [<a0000001001484f0>] sys_mkdir+0x170/0x280
                                sp=e0000001a8a67db0 bsp=e0000001a8a60f30
 [<a00000010000f3e0>] ia64_ret_from_syscall+0x0/0x20

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Question about ext3 jbd module
  2006-07-25  8:06 Question about ext3 jbd module bibo, mao
@ 2006-07-25  9:04 ` Andrew Morton
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2006-07-25  9:04 UTC (permalink / raw)
  To: bibo, mao; +Cc: linux-kernel, ext2-devel

On Tue, 25 Jul 2006 08:06:02 +0000
"bibo, mao" <bibo.mao@intel.com> wrote:

> Hi, 
>    When I run LTP stress test on my IA64 box based on ditribution
> kernel version, kernel will crash within three days, I think this 
> problem should exist in recent kernel also, but I does not trigger
> this. The problem is in function journal_dirty_metadata(),
> there is one sentence like this: 
> 	struct journal_head *jh = bh2jh(bh);
> >From my debug result, jh is NULL at this point, I do not know whether
> it is because of contention or kernel forgets to consider NULL pointer
> condition.
> 
> I wrote one patch, after this patch LTP stress test does not crash, but
> I am not familiar filesystem, I do not know whether it is the root cause
> or what negative influence this patch will bring out.
> 
> Thanks
> bibo,mao
> 
> --- linux-2.6.9/fs/jbd/transaction.c.orig       2006-06-30 14:05:58.000000000 +0800
> +++ linux-2.6.9/fs/jbd/transaction.c    2006-07-07 02:56:32.000000000 +0800
> @@ -1104,13 +1104,15 @@ int journal_dirty_metadata(handle_t *han
>  {
>         transaction_t *transaction = handle->h_transaction;
>         journal_t *journal = transaction->t_journal;
> -       struct journal_head *jh = bh2jh(bh);
> +       struct journal_head *jh;
> 
> -       jbd_debug(5, "journal_head %p\n", jh);
> -       JBUFFER_TRACE(jh, "entry");
>         if (is_handle_aborted(handle))
>                 goto out;
> 
> +       jh = journal_add_journal_head(bh);
> +       jbd_debug(5, "journal_head %p\n", jh);
> +       JBUFFER_TRACE(jh, "entry");
> +
>         jbd_lock_bh_state(bh);
> 
>         /*
> @@ -1154,6 +1156,7 @@ int journal_dirty_metadata(handle_t *han
>         spin_unlock(&journal->j_list_lock);
>  out_unlock_bh:
>         jbd_unlock_bh_state(bh);
> +       journal_put_journal_head(jh);
>  out:
>         JBUFFER_TRACE(jh, "exit");
>         return 0;
> 

That's a worry.  We've attached a journal_head and we've done
do_get_write_access() and we're now proceeding to journal the buffer as
metadata but someone has presumably gone and run
__journal_try_to_free_buffer() against the thing and has stolen our
journal_head.

Simply reattaching a new journal_head is most likely wrong - we'll lose
whatever state was supposed to be in the old one (like, which journal list
this bh+jh is on).

Somewhere, somehow, that journal_head has passed through a state which
permitted __journal_try_to_free_buffer() to free it while appropriate locks
were not held.  I wonder where.

Your diff headers claim to be against 2.6.9.  Is that so?

Would it be correct to assume that there was some page replacement pressure
happening at the time?

It looks like a big box - can you describe it a bit please?

> Pid: 28417, CPU 13, comm:              inode02
> psr : 0000121008126010 ifs : 800000000000040d ip  : [<a000000200134721>]    Not tainted
> ip is at journal_dirty_metadata+0x2c1/0x5e0 [jbd]
> unat: 0000000000000000 pfs : 0000000000000917 rsc : 0000000000000003
> rnat: 0000000000000000 bsps: 0000000000000000 pr  : 005965a026595569
> ldrs: 0000000000000000 ccv : 0000000000060011 fpsr: 0009804c8a70033f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a0000002001d3350 b6  : a000000100589f20 b7  : a0000001001fee60
> f6  : 1003e0000000000000000 f7  : 1003e0000000000000080
> f8  : 1003e00000000000008c1 f9  : 1003effffffffffffc0a0
> f10 : 100049c8d719c0533ddf0 f11 : 1003e00000000000008c1
> r1  : a000000200330000 r2  : a000000200158630 r3  : a000000200158630
> r8  : e0000001d885631c r9  : 0000000000000000 r10 : e0000001bede66e0
> r11 : 0000000000000010 r12 : e0000001a8a67d40 r13 : e0000001a8a60000
> r14 : 0000000000060011 r15 : 0000000000000000 r16 : e0000001fef76e80
> r17 : e0000001e3e34b38 r18 : 0000000000000020 r19 : 0000000000060011
> r20 : 00000000000e0011 r21 : 0000000000060011 r22 : 0000000000000000
> r23 : 0000000000000000 r24 : 0000000000000000 r25 : e0000001e6cc8090
> r26 : e0000001d88561e0 r27 : 0000000044bc2661 r28 : e0000001d8856078
> r29 : 000000007c6fe61a r30 : e0000001e6cc80e8 r31 : e0000001d8856080
> 
> Call Trace:
>  [<a000000100016b20>] show_stack+0x80/0xa0
>                                 sp=e0000001a8a67750 bsp=e0000001a8a61360
>  [<a000000100017430>] show_regs+0x890/0x8c0
>                                 sp=e0000001a8a67920 bsp=e0000001a8a61318
>  [<a00000010003dbf0>] die+0x150/0x240
>                                 sp=e0000001a8a67940 bsp=e0000001a8a612d8
>  [<a00000010003dd20>] die_if_kernel+0x40/0x60
>                                 sp=e0000001a8a67940 bsp=e0000001a8a612a8
>  [<a00000010003f930>] ia64_fault+0x1450/0x15a0
>                                 sp=e0000001a8a67940 bsp=e0000001a8a61250
>  [<a00000010000f540>] ia64_leave_kernel+0x0/0x260
>                                 sp=e0000001a8a67b70 bsp=e0000001a8a61250
>  [<a000000200134720>] journal_dirty_metadata+0x2c0/0x5e0 [jbd]
>                                 sp=e0000001a8a67d40 bsp=e0000001a8a611e0
>  [<a0000002001d3350>] ext3_mark_iloc_dirty+0x750/0xc00 [ext3]
>                                 sp=e0000001a8a67d40 bsp=e0000001a8a61150
>  [<a0000002001d3a90>] ext3_mark_inode_dirty+0xb0/0xe0 [ext3]
>                                 sp=e0000001a8a67d40 bsp=e0000001a8a61128
>  [<a0000002001ce3d0>] ext3_new_inode+0x1670/0x1a80 [ext3]
>                                 sp=e0000001a8a67d60 bsp=e0000001a8a61068
>  [<a0000002001df5e0>] ext3_mkdir+0x120/0x940 [ext3]
>                                 sp=e0000001a8a67da0 bsp=e0000001a8a61008
>  [<a000000100148250>] vfs_mkdir+0x250/0x380
>                                 sp=e0000001a8a67db0 bsp=e0000001a8a60fb0
>  [<a0000001001484f0>] sys_mkdir+0x170/0x280
>                                 sp=e0000001a8a67db0 bsp=e0000001a8a60f30
>  [<a00000010000f3e0>] ia64_ret_from_syscall+0x0/0x20
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Question about ext3 jbd module
@ 2006-07-25  9:59 Mao, Bibo
  0 siblings, 0 replies; 3+ messages in thread
From: Mao, Bibo @ 2006-07-25  9:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, ext2-devel

Yes, kernel version is 2.6.9, it is OS distribution kernel RHEL4. I run LTP stress test, which includes memory, file system etc stress test. And my machine has 4 physical IA64 CPU with dual core and hyptherthread function. I ever added printk information in the head of function journal_dirty_metadata(), jh value is NULL.

The LTP tool version is ltp-full-20060412, test case is testscripts/ltpstress.sh, and stress test will crash with/without patch in https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=158363,

Thanks
Bibo,mao

>-----Original Message-----
>From: Andrew Morton [mailto:akpm@osdl.org]
>Sent: 2006年7月25日 17:05
>To: Mao, Bibo
>Cc: linux-kernel@vger.kernel.org; ext2-devel@lists.sourceforge.net
>Subject: Re: Question about ext3 jbd module
>
>On Tue, 25 Jul 2006 08:06:02 +0000
>"bibo, mao" <bibo.mao@intel.com> wrote:
>
>> Hi,
>>    When I run LTP stress test on my IA64 box based on ditribution
>> kernel version, kernel will crash within three days, I think this
>> problem should exist in recent kernel also, but I does not trigger
>> this. The problem is in function journal_dirty_metadata(),
>> there is one sentence like this:
>> 	struct journal_head *jh = bh2jh(bh);
>> >From my debug result, jh is NULL at this point, I do not know whether
>> it is because of contention or kernel forgets to consider NULL pointer
>> condition.
>>
>> I wrote one patch, after this patch LTP stress test does not crash, but
>> I am not familiar filesystem, I do not know whether it is the root cause
>> or what negative influence this patch will bring out.
>>
>> Thanks
>> bibo,mao
>>
>> --- linux-2.6.9/fs/jbd/transaction.c.orig       2006-06-30
>14:05:58.000000000 +0800
>> +++ linux-2.6.9/fs/jbd/transaction.c    2006-07-07 02:56:32.000000000 +0800
>> @@ -1104,13 +1104,15 @@ int journal_dirty_metadata(handle_t *han
>>  {
>>         transaction_t *transaction = handle->h_transaction;
>>         journal_t *journal = transaction->t_journal;
>> -       struct journal_head *jh = bh2jh(bh);
>> +       struct journal_head *jh;
>>
>> -       jbd_debug(5, "journal_head %p\n", jh);
>> -       JBUFFER_TRACE(jh, "entry");
>>         if (is_handle_aborted(handle))
>>                 goto out;
>>
>> +       jh = journal_add_journal_head(bh);
>> +       jbd_debug(5, "journal_head %p\n", jh);
>> +       JBUFFER_TRACE(jh, "entry");
>> +
>>         jbd_lock_bh_state(bh);
>>
>>         /*
>> @@ -1154,6 +1156,7 @@ int journal_dirty_metadata(handle_t *han
>>         spin_unlock(&journal->j_list_lock);
>>  out_unlock_bh:
>>         jbd_unlock_bh_state(bh);
>> +       journal_put_journal_head(jh);
>>  out:
>>         JBUFFER_TRACE(jh, "exit");
>>         return 0;
>>
>
>That's a worry.  We've attached a journal_head and we've done
>do_get_write_access() and we're now proceeding to journal the buffer as
>metadata but someone has presumably gone and run
>__journal_try_to_free_buffer() against the thing and has stolen our
>journal_head.
>
>Simply reattaching a new journal_head is most likely wrong - we'll lose
>whatever state was supposed to be in the old one (like, which journal list
>this bh+jh is on).
>
>Somewhere, somehow, that journal_head has passed through a state which
>permitted __journal_try_to_free_buffer() to free it while appropriate locks
>were not held.  I wonder where.
>
>Your diff headers claim to be against 2.6.9.  Is that so?
>
>Would it be correct to assume that there was some page replacement pressure
>happening at the time?
>
>It looks like a big box - can you describe it a bit please?
>
>> Pid: 28417, CPU 13, comm:              inode02
>> psr : 0000121008126010 ifs : 800000000000040d ip  : [<a000000200134721>]
>Not tainted
>> ip is at journal_dirty_metadata+0x2c1/0x5e0 [jbd]
>> unat: 0000000000000000 pfs : 0000000000000917 rsc : 0000000000000003
>> rnat: 0000000000000000 bsps: 0000000000000000 pr  : 005965a026595569
>> ldrs: 0000000000000000 ccv : 0000000000060011 fpsr: 0009804c8a70033f
>> csd : 0000000000000000 ssd : 0000000000000000
>> b0  : a0000002001d3350 b6  : a000000100589f20 b7  : a0000001001fee60
>> f6  : 1003e0000000000000000 f7  : 1003e0000000000000080
>> f8  : 1003e00000000000008c1 f9  : 1003effffffffffffc0a0
>> f10 : 100049c8d719c0533ddf0 f11 : 1003e00000000000008c1
>> r1  : a000000200330000 r2  : a000000200158630 r3  : a000000200158630
>> r8  : e0000001d885631c r9  : 0000000000000000 r10 : e0000001bede66e0
>> r11 : 0000000000000010 r12 : e0000001a8a67d40 r13 : e0000001a8a60000
>> r14 : 0000000000060011 r15 : 0000000000000000 r16 : e0000001fef76e80
>> r17 : e0000001e3e34b38 r18 : 0000000000000020 r19 : 0000000000060011
>> r20 : 00000000000e0011 r21 : 0000000000060011 r22 : 0000000000000000
>> r23 : 0000000000000000 r24 : 0000000000000000 r25 : e0000001e6cc8090
>> r26 : e0000001d88561e0 r27 : 0000000044bc2661 r28 : e0000001d8856078
>> r29 : 000000007c6fe61a r30 : e0000001e6cc80e8 r31 : e0000001d8856080
>>
>> Call Trace:
>>  [<a000000100016b20>] show_stack+0x80/0xa0
>>                                 sp=e0000001a8a67750 bsp=e0000001a8a61360
>>  [<a000000100017430>] show_regs+0x890/0x8c0
>>                                 sp=e0000001a8a67920 bsp=e0000001a8a61318
>>  [<a00000010003dbf0>] die+0x150/0x240
>>                                 sp=e0000001a8a67940 bsp=e0000001a8a612d8
>>  [<a00000010003dd20>] die_if_kernel+0x40/0x60
>>                                 sp=e0000001a8a67940 bsp=e0000001a8a612a8
>>  [<a00000010003f930>] ia64_fault+0x1450/0x15a0
>>                                 sp=e0000001a8a67940 bsp=e0000001a8a61250
>>  [<a00000010000f540>] ia64_leave_kernel+0x0/0x260
>>                                 sp=e0000001a8a67b70 bsp=e0000001a8a61250
>>  [<a000000200134720>] journal_dirty_metadata+0x2c0/0x5e0 [jbd]
>>                                 sp=e0000001a8a67d40 bsp=e0000001a8a611e0
>>  [<a0000002001d3350>] ext3_mark_iloc_dirty+0x750/0xc00 [ext3]
>>                                 sp=e0000001a8a67d40 bsp=e0000001a8a61150
>>  [<a0000002001d3a90>] ext3_mark_inode_dirty+0xb0/0xe0 [ext3]
>>                                 sp=e0000001a8a67d40 bsp=e0000001a8a61128
>>  [<a0000002001ce3d0>] ext3_new_inode+0x1670/0x1a80 [ext3]
>>                                 sp=e0000001a8a67d60 bsp=e0000001a8a61068
>>  [<a0000002001df5e0>] ext3_mkdir+0x120/0x940 [ext3]
>>                                 sp=e0000001a8a67da0 bsp=e0000001a8a61008
>>  [<a000000100148250>] vfs_mkdir+0x250/0x380
>>                                 sp=e0000001a8a67db0 bsp=e0000001a8a60fb0
>>  [<a0000001001484f0>] sys_mkdir+0x170/0x280
>>                                 sp=e0000001a8a67db0 bsp=e0000001a8a60f30
>>  [<a00000010000f3e0>] ia64_ret_from_syscall+0x0/0x20
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-07-25 10:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-25  8:06 Question about ext3 jbd module bibo, mao
2006-07-25  9:04 ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2006-07-25  9:59 Mao, Bibo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox