[Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
@ 2004-08-23  3:05 Ling, Xiaofeng
  2004-08-23 14:16 ` Mark Fasheh
  0 siblings, 1 reply; 8+ messages in thread
From: Ling, Xiaofeng @ 2004-08-23  3:05 UTC (permalink / raw)
  To: ocfs2-devel

before umount, all the lock shall be released first.
this patch can resolve the problem

------------------------------------------------
Index: super.c
===================================================================
--- super.c (revision 1370)
+++ super.c (working copy)
@@ -224,6 +224,7 @@
    tid_t target;

    sb->s_dirt = 0;
+   ocfs_commit_cache(OCFS2_SB(sb));
    target = log_start_commit(OCFS2_SB(sb)->journal->k_journal, NULL);
    log_wait_commit(OCFS2_SB(sb)->journal->k_journal, target);
    return 0;
@@ -234,6 +235,7 @@
    tid_t target;

    sb->s_dirt = 0;
+   ocfs_commit_cache(OCFS2_SB(sb));
    if (journal_start_commit(OCFS2_SB(sb)->journal->k_journal, &target))
{
        if (wait)
            log_wait_commit(OCFS2_SB(sb)->journal->k_journal,
Index: journal.c
===================================================================
--- journal.c   (revision 1370)
+++ journal.c   (working copy)
@@ -61,7 +61,7 @@
                   struct inode *inode);
 static int ocfs_recover_node(struct _ocfs_super *osb, int node_num);
 static int __ocfs_recovery_thread(void *arg);
-static int ocfs_commit_cache (ocfs_super * osb);
+int ocfs_commit_cache (ocfs_super * osb);
 static int ocfs_wait_on_mount(ocfs_super *osb);
 static void ocfs_handle_move_locks(ocfs_journal *journal,
                   ocfs_journal_handle *handle);
@@ -149,7 +149,7 @@
  * This is in journal.c for lack of a better place.
  *
  */
-static int ocfs_commit_cache(ocfs_super *osb)
+int ocfs_commit_cache(ocfs_super *osb)
 {
    int status = 0, tmpstat;
    ocfs_journal * journal = NULL;


-------------------
Ling Xiaofeng(Daniel)

Intel China Software Lab.
iNet: 8-752-1243
8621-52574545-1243(O)

xfling@users.sourceforge.net
Opinions are my own and don't represent those of my employer 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
  2004-08-23  3:05 Ling, Xiaofeng
@ 2004-08-23 14:16 ` Mark Fasheh
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Fasheh @ 2004-08-23 14:16 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Aug 23, 2004 at 04:05:57PM +0800, Ling, Xiaofeng wrote:
> before umount, all the lock shall be released first.
> this patch can resolve the problem
This isn't the way we want to handle this. The way it's supposed to work is
that the umount process sends a signal to the commit thread to shutdown and
then waits on it's last set of checkpoints / releases. I'd much rather fix
what's broken than patch over it :)

I'm actually looking at this bug right now and I think it's due to our
signal handling code...
	--Mark

> 
> ------------------------------------------------
> Index: super.c
> ===================================================================
> --- super.c (revision 1370)
> +++ super.c (working copy)
> @@ -224,6 +224,7 @@
>     tid_t target;
> 
>     sb->s_dirt = 0;
> +   ocfs_commit_cache(OCFS2_SB(sb));
>     target = log_start_commit(OCFS2_SB(sb)->journal->k_journal, NULL);
>     log_wait_commit(OCFS2_SB(sb)->journal->k_journal, target);
>     return 0;
> @@ -234,6 +235,7 @@
>     tid_t target;
> 
>     sb->s_dirt = 0;
> +   ocfs_commit_cache(OCFS2_SB(sb));
>     if (journal_start_commit(OCFS2_SB(sb)->journal->k_journal, &target))
> {
>         if (wait)
>             log_wait_commit(OCFS2_SB(sb)->journal->k_journal,
> Index: journal.c
> ===================================================================
> --- journal.c   (revision 1370)
> +++ journal.c   (working copy)
> @@ -61,7 +61,7 @@
>                    struct inode *inode);
>  static int ocfs_recover_node(struct _ocfs_super *osb, int node_num);
>  static int __ocfs_recovery_thread(void *arg);
> -static int ocfs_commit_cache (ocfs_super * osb);
> +int ocfs_commit_cache (ocfs_super * osb);
>  static int ocfs_wait_on_mount(ocfs_super *osb);
>  static void ocfs_handle_move_locks(ocfs_journal *journal,
>                    ocfs_journal_handle *handle);
> @@ -149,7 +149,7 @@
>   * This is in journal.c for lack of a better place.
>   *
>   */
> -static int ocfs_commit_cache(ocfs_super *osb)
> +int ocfs_commit_cache(ocfs_super *osb)
>  {
>     int status = 0, tmpstat;
>     ocfs_journal * journal = NULL;
> 
> 
> -------------------
> Ling Xiaofeng(Daniel)
> 
> Intel China Software Lab.
> iNet: 8-752-1243
> 8621-52574545-1243(O)
> 
> xfling@users.sourceforge.net
> Opinions are my own and don't represent those of my employer 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
@ 2004-08-23 22:42 Ling, Xiaofeng
  2004-08-24 13:07 ` Mark Fasheh
  0 siblings, 1 reply; 8+ messages in thread
From: Ling, Xiaofeng @ 2004-08-23 22:42 UTC (permalink / raw)
  To: ocfs2-devel

The current process is=20
ocfs_clear_inode->
ocfs_dismount_volumn->
ocfs_journal_shutdown->
wake_up(&osb->flush_event)
although it seems this can release the lock. but the bug still exist.
The bug is triggered in submit_bh,=20
in kernel, the process is
do_umount->
fsync_super->
ocfs_sync_fs, sync_blockdev, sync_inodes_sb->
ocfs_clear_inode
So I guess the bug is triggered in sync_blockdev before ocfs_clear_inode =
then put that in ocfs_clear_inode is too later.
I've tried to move ocfs_dismount_volumn or ocfs_journal_shutdown to =
ocfs_sync_fs, but that will cause other problems.
and if just put ocfs_commit_cache in it, it's ok.
If you don't want an extra call to ocfs_commit_cache, we need to  find a =
better place to put ocfs_dismount_volumn.

=20

>-----Original Message-----
>From: Mark Fasheh [mailto:mark.fasheh@oracle.com]=20
>Sent: 2004=C4=EA8=D4=C224=C8=D5 3:17
>To: Ling, Xiaofeng
>Cc: ocfs2-devel@oss.oracle.com
>Subject: Re: [Ocfs2-devel] [PATCH] fix big 116 -- umount cause=20
>crash after some operation
>
>On Mon, Aug 23, 2004 at 04:05:57PM +0800, Ling, Xiaofeng wrote:
>> before umount, all the lock shall be released first.
>> this patch can resolve the problem
>This isn't the way we want to handle this. The way it's=20
>supposed to work is
>that the umount process sends a signal to the commit thread to=20
>shutdown and
>then waits on it's last set of checkpoints / releases. I'd=20
>much rather fix
>what's broken than patch over it :)
>
>I'm actually looking at this bug right now and I think it's due to our
>signal handling code...
>	--Mark
>
>>=20
>> ------------------------------------------------
>> Index: super.c
>> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- super.c (revision 1370)
>> +++ super.c (working copy)
>> @@ -224,6 +224,7 @@
>>     tid_t target;
>>=20
>>     sb->s_dirt =3D 0;
>> +   ocfs_commit_cache(OCFS2_SB(sb));
>>     target =3D=20
>log_start_commit(OCFS2_SB(sb)->journal->k_journal, NULL);
>>     log_wait_commit(OCFS2_SB(sb)->journal->k_journal, target);
>>     return 0;
>> @@ -234,6 +235,7 @@
>>     tid_t target;
>>=20
>>     sb->s_dirt =3D 0;
>> +   ocfs_commit_cache(OCFS2_SB(sb));
>>     if=20
>(journal_start_commit(OCFS2_SB(sb)->journal->k_journal, &target))
>> {
>>         if (wait)
>>             log_wait_commit(OCFS2_SB(sb)->journal->k_journal,
>> Index: journal.c
>> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- journal.c   (revision 1370)
>> +++ journal.c   (working copy)
>> @@ -61,7 +61,7 @@
>>                    struct inode *inode);
>>  static int ocfs_recover_node(struct _ocfs_super *osb, int node_num);
>>  static int __ocfs_recovery_thread(void *arg);
>> -static int ocfs_commit_cache (ocfs_super * osb);
>> +int ocfs_commit_cache (ocfs_super * osb);
>>  static int ocfs_wait_on_mount(ocfs_super *osb);
>>  static void ocfs_handle_move_locks(ocfs_journal *journal,
>>                    ocfs_journal_handle *handle);
>> @@ -149,7 +149,7 @@
>>   * This is in journal.c for lack of a better place.
>>   *
>>   */
>> -static int ocfs_commit_cache(ocfs_super *osb)
>> +int ocfs_commit_cache(ocfs_super *osb)
>>  {
>>     int status =3D 0, tmpstat;
>>     ocfs_journal * journal =3D NULL;
>>=20
>>=20
>> -------------------
>> Ling Xiaofeng(Daniel)
>>=20
>> Intel China Software Lab.
>> iNet: 8-752-1243
>> 8621-52574545-1243(O)
>>=20
>> xfling@users.sourceforge.net
>> Opinions are my own and don't represent those of my employer=20
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>--
>Mark Fasheh
>Software Developer, Oracle Corp
>mark.fasheh@oracle.com
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
  2004-08-23 22:42 Ling, Xiaofeng
@ 2004-08-24 13:07 ` Mark Fasheh
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Fasheh @ 2004-08-24 13:07 UTC (permalink / raw)
  To: ocfs2-devel


On Tue, Aug 24, 2004 at 11:42:44AM +0800, Ling, Xiaofeng wrote:
> The current process is 
> ocfs_clear_inode->
> ocfs_dismount_volumn->
> ocfs_journal_shutdown->
> wake_up(&osb->flush_event)
> although it seems this can release the lock. but the bug still exist.
> The bug is triggered in submit_bh, 
> in kernel, the process is
> do_umount->
> fsync_super->
> ocfs_sync_fs, sync_blockdev, sync_inodes_sb->
> ocfs_clear_inode
> So I guess the bug is triggered in sync_blockdev before ocfs_clear_inode then put that in ocfs_clear_inode is too later.
> I've tried to move ocfs_dismount_volumn or ocfs_journal_shutdown to ocfs_sync_fs, but that will cause other problems.
> and if just put ocfs_commit_cache in it, it's ok.
> If you don't want an extra call to ocfs_commit_cache, we need to  find a better place to put ocfs_dismount_volumn.
 
Do you have a kernel stack trace? I want to make sure we're talking about
the same bug (s) here. The 1st one I dealt with was only that commit thread
was trying to release the locks with a signal pending which caused the code
in dlm.c to freak out. I removed signal handling from it altogether and just
use our ocfs_wait function instead. This seemed to alleviate that problem.
Issue #2 I see during umount is that occasionally the ocfs2 nm thread will 
crash in __make_request:

------------[ cut here ]------------
kernel BUG at ll_rw_blk.c:1014!
invalid operand: 0000
ocfs2 nfs lockd sunrpc parport_pc lp parport netconsole autofs tg3 floppy sg
loo
p lvm-mod keybdev mousedev hid input usb-uhci usbcore ext3 jbd qla2300
aic7xxx
CPU:    6
EIP:    0060:[<c01ccad1>]    Not tainted
EFLAGS: 00010246

EIP is at __make_request [kernel] 0x81 (2.4.21-15.ELsmp/i686)
eax: 00000800   ebx: 00000000   ecx: 00000b40   edx: 00000008
esi: f3cbceec   edi: f6285618   ebp: 00000b40   esp: c6f6fdc4
ds: 0068   es: 0068   ss: 0068
Process ocfs2nm-0 (pid: 9164, stackpage=c6f6f000)
Stack: f6285600 c0435180 00000006 c53ee000 c6f6fe18 c0123274 c0436680
c6f6e000 
       c53ee000 000000ff 00000800 f6285640 c6f6e000 c0436680 00000000
00000008 
       00000b40 00000006 f3cbceec 00000008 013fe5b9 00000b40 c01cd22a
f6285618 
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xc6f6fdd8)
[<c01cd22a>] generic_make_request [kernel] 0xea (0xc6f6fe1c)
[<c01cd2c9>] submit_bh_rsector [kernel] 0x49 (0xc6f6fe44)
[<f8b05bbd>] ocfs_read_bhs [ocfs2] 0x641 (0xc6f6fe60)
[<c013334b>] del_timer_sync [kernel] 0x1b (0xc6f6fe94)
[<f8b1c418>] ocfs_volume_thread [ocfs2] 0x25c (0xc6f6fec0)
[<c0122450>] load_balance [kernel] 0x30 (0xc6f6ff0c)
[<c013dd94>] clear_page_tables [kernel] 0x64 (0xc6f6ff24)
[<c0123274>] schedule [kernel] 0x2f4 (0xc6f6ff4c)
[<c012c4c1>] exit_notify [kernel] 0x121 (0xc6f6ff70)
[<c012c9c0>] do_exit [kernel] 0x370 (0xc6f6ff90)
[<f8b1c1bc>] ocfs_volume_thread [ocfs2] 0x0 (0xc6f6ffe0)
[<c010958d>] kernel_thread_helper [kernel] 0x5 (0xc6f6fff0)

Code: 0f 0b f6 03 7c d4 2b c0 89 74 24 08 89 5c 24 04 89 3c 24 e8

I'm still trying to track this one down. If we're still not on the same
page, some more explanation would be helpfull :) Thanks,
	--Mark
>  
> 
> >-----Original Message-----
> >From: Mark Fasheh [mailto:mark.fasheh@oracle.com] 
> >Sent: 2004??8??24?? 3:17
> >To: Ling, Xiaofeng
> >Cc: ocfs2-devel@oss.oracle.com
> >Subject: Re: [Ocfs2-devel] [PATCH] fix big 116 -- umount cause 
> >crash after some operation
> >
> >On Mon, Aug 23, 2004 at 04:05:57PM +0800, Ling, Xiaofeng wrote:
> >> before umount, all the lock shall be released first.
> >> this patch can resolve the problem
> >This isn't the way we want to handle this. The way it's 
> >supposed to work is
> >that the umount process sends a signal to the commit thread to 
> >shutdown and
> >then waits on it's last set of checkpoints / releases. I'd 
> >much rather fix
> >what's broken than patch over it :)
> >
> >I'm actually looking at this bug right now and I think it's due to our
> >signal handling code...
> >	--Mark
> >
> >> 
> >> ------------------------------------------------
> >> Index: super.c
> >> ===================================================================
> >> --- super.c (revision 1370)
> >> +++ super.c (working copy)
> >> @@ -224,6 +224,7 @@
> >>     tid_t target;
> >> 
> >>     sb->s_dirt = 0;
> >> +   ocfs_commit_cache(OCFS2_SB(sb));
> >>     target = 
> >log_start_commit(OCFS2_SB(sb)->journal->k_journal, NULL);
> >>     log_wait_commit(OCFS2_SB(sb)->journal->k_journal, target);
> >>     return 0;
> >> @@ -234,6 +235,7 @@
> >>     tid_t target;
> >> 
> >>     sb->s_dirt = 0;
> >> +   ocfs_commit_cache(OCFS2_SB(sb));
> >>     if 
> >(journal_start_commit(OCFS2_SB(sb)->journal->k_journal, &target))
> >> {
> >>         if (wait)
> >>             log_wait_commit(OCFS2_SB(sb)->journal->k_journal,
> >> Index: journal.c
> >> ===================================================================
> >> --- journal.c   (revision 1370)
> >> +++ journal.c   (working copy)
> >> @@ -61,7 +61,7 @@
> >>                    struct inode *inode);
> >>  static int ocfs_recover_node(struct _ocfs_super *osb, int node_num);
> >>  static int __ocfs_recovery_thread(void *arg);
> >> -static int ocfs_commit_cache (ocfs_super * osb);
> >> +int ocfs_commit_cache (ocfs_super * osb);
> >>  static int ocfs_wait_on_mount(ocfs_super *osb);
> >>  static void ocfs_handle_move_locks(ocfs_journal *journal,
> >>                    ocfs_journal_handle *handle);
> >> @@ -149,7 +149,7 @@
> >>   * This is in journal.c for lack of a better place.
> >>   *
> >>   */
> >> -static int ocfs_commit_cache(ocfs_super *osb)
> >> +int ocfs_commit_cache(ocfs_super *osb)
> >>  {
> >>     int status = 0, tmpstat;
> >>     ocfs_journal * journal = NULL;
> >> 
> >> 
> >> -------------------
> >> Ling Xiaofeng(Daniel)
> >> 
> >> Intel China Software Lab.
> >> iNet: 8-752-1243
> >> 8621-52574545-1243(O)
> >> 
> >> xfling@users.sourceforge.net
> >> Opinions are my own and don't represent those of my employer 
> >> _______________________________________________
> >> Ocfs2-devel mailing list
> >> Ocfs2-devel@oss.oracle.com
> >> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >--
> >Mark Fasheh
> >Software Developer, Oracle Corp
> >mark.fasheh@oracle.com
> >
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
@ 2004-08-24 20:04 Ling, Xiaofeng
  2004-08-25 12:39 ` Mark Fasheh
  0 siblings, 1 reply; 8+ messages in thread
From: Ling, Xiaofeng @ 2004-08-24 20:04 UTC (permalink / raw)
  To: ocfs2-devel

>Do you have a kernel stack trace? I want to make sure we're 
>talking about
>the same bug (s) here. The 1st one I dealt with was only that 
>commit thread
>was trying to release the locks with a signal pending which 
>caused the code
>in dlm.c to freak out. I removed signal handling from it 
>altogether and just
>use our ocfs_wait function instead. This seemed to alleviate 
>that problem.
>Issue #2 I see during umount is that occasionally the ocfs2 nm 
>thread will 
>crash in __make_request:
So what we see is not the same. 
The bug 116 is easy to reproduce, just do a command line as
root#mkdir /ocfs/aaa;umount /ocfs
follow the umount to mkdir tightly.


============================================
 call dismount now
(4577) ENTRY: ocfs_put_super(0xdb40ba00)
(4577) TRACE: ocfs_put_super() put super... do nothing!  DONE!!!!
(4577) EXIT : ocfs_put_super()
VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice
day...
------------[ cut here ]------------
kernel BUG at fs/buffer.c:2703!
invalid operand: 0000 [#2]
CPU:    0
EIP:    0060:[<c014bb80>]    Not tainted
EFLAGS: 00010202   (2.6.6kdb)
EIP is at submit_bh+0x153/0x16d
eax: 00000010   ebx: 00000008   ecx: d5811ea0   edx: d5811ea0
esi: d5811ea0   edi: 00001000   ebp: 00000000   esp: d75cbe48
ds: 007b   es: 007b   ss: 0068
Process ocfs2nm-0 (pid: 4572, threadinfo=d75ca000 task=dfb21230)
Stack: c0373a72 d75cbe9c 00000046 df45dc00 df45dc00 d5811820 00000000
00000008
       d5811ea0 00001000 00000000 e09dc3e6 00000000 d5811ea0 000000d8
3f5f0e04
       00003c26 dfb213d8 03ecda38 d75cbeb0 d75ca000 d75ca000 d75ca000
d75ca000
Call Trace:
 [<c0373a72>] schedule+0x30e/0x50c
 [<e09dc3e6>] ocfs_read_bhs+0x6e3/0x849 [ocfs2]
 [<e09f5c16>] ocfs_volume_thread+0x2a6/0x75c [ocfs2]
 [<c0118f21>] exit_notify+0x2b0/0x772
 [<c01196a8>] do_exit+0x2c5/0x34d
 [<c01197cf>] do_group_exit+0x52/0x72
 [<c0105d4a>] ret_from_fork+0x6/0x14
 [<e09f5970>] ocfs_volume_thread+0x0/0x75c [ocfs2]
 [<e09f5970>] ocfs_volume_thread+0x0/0x75c [ocfs2]
 [<c0104291>] kernel_thread_helper+0x5/0xb

Code: 0f 0b 8f 0a 99 6d 39 c0 e9 cb fe ff ff 0f 0b 8e 0a 99 6d 39

ds: 007b   es: 007b   ss: 0068
Process ocfs2cmt-0 (pid: 4575, threadinfo=dd0a4000 task=df2c01b0)
Stack: 00000000 dd0a5d94 00000282 40016863 00000000 d8064d80 d941a924
db8cb000
       d3044888 0000010d d8064d80 c014bc7a 00000001 d3044888 db8cb000
dfbcd354
       db8cb000 dfbcd354 c018fd2f d3044888 d941a924 c018d92e d3044888
0000000b
Call Trace:
 [<c014bc7a>] sync_dirty_buffer+0x5c/0xaf
 [<c018fd2f>] journal_update_superblock+0x60/0xa1
 [<c018d92e>] __try_to_free_cp_buf+0x46/0x4d
 [<c018de05>] cleanup_journal_tail+0x7a/0xe1
 [<c018dcec>] log_do_checkpoint+0xd6/0x175
 [<c0131ba6>] buffered_rmqueue+0xc4/0x15c
 [<c0131cdc>] __alloc_pages+0x9e/0x316
 [<c013ab73>] do_anonymous_page+0xdc/0x190
 [<c013ac8c>] do_no_page+0x65/0x2d9
 [<c018a605>] journal_lock_updates+0xbd/0xcf
 [<c0115671>] autoremove_wake_function+0x0/0x4f
 [<c01904f5>] journal_flush+0x1c3/0x1f0
 [<c0373a72>] schedule+0x30e/0x50c
 [<e09ea42f>] ocfs_commit_cache+0x1ab/0x51b [ocfs2]
 [<c01188de>] daemonize+0xd5/0xda
 [<e09eef8b>] ocfs_commit_thread+0x117/0x46f [ocfs2]
 [<c0114708>] default_wake_function+0x0/0x26
 [<c0114708>] default_wake_function+0x0/0x26
 [<c0114708>] default_wake_function+0x0/0x26
 [<c0114708>] default_wake_function+0x0/0x26
 [<e0a002ed>] ocfs_timeout_func+0x0/0x1e [ocfs2]
 [<c0118f21>] exit_notify+0x2b0/0x772
 [<c0373796>] schedule+0x32/0x50c
 [<c01196a8>] do_exit+0x2c5/0x34d
 [<c0105e8e>] work_resched+0x5/0x16
 [<e09eee74>] ocfs_commit_thread+0x0/0x46f [ocfs2]
 [<e09eee74>] ocfs_commit_thread+0x0/0x46f [ocfs2]
 [<c0104291>] kernel_thread_helper+0x5/0xb

Code: 0f 0b 8f 0a 99 6d 39 c0 e9 cb fe ff ff 0f 0b 8e 0a 99 6d 39

 ------------[ cut here ]------------
kernel BUG at fs/buffer.c:2703!
invalid operand: 0000 [#3]
CPU:    0
EIP:    0060:[<c014bb80>]    Not tainted
EFLAGS: 00010202   (2.6.6kdb)
EIP is at submit_bh+0x153/0x16d
eax: 00000010   ebx: db8cb000   ecx: d3044888   edx: d3044888
esi: d3044888   edi: 0000010d   ebp: d8064d80   esp: dd0a5c4c

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
  2004-08-24 20:04 [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation Ling, Xiaofeng
@ 2004-08-25 12:39 ` Mark Fasheh
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Fasheh @ 2004-08-25 12:39 UTC (permalink / raw)
  To: ocfs2-devel

On Wed, Aug 25, 2004 at 09:04:39AM +0800, Ling, Xiaofeng wrote:
> So what we see is not the same. 
> The bug 116 is easy to reproduce, just do a command line as
> root#mkdir /ocfs/aaa;umount /ocfs
> follow the umount to mkdir tightly.
Ok, I think I see what's going on here.

if I do this:

mkdir /ocfs2/foobar/baz; umount /ocfs2

assuming that foobar was created a while ago so the commit thread has no
work currently then the umount goes through fine, we wait on the commit
thread and he flushes everything.

however,

mkdir /ocfs/foo; umount /ocfs2

will crash things almost immediately. What's the only difference there?
Basically that the root inode is in the commit thread on the second case.

What I believe is happening is that umount succeeds without calling
clear_inode on the root inode because we still have a reference on it, so
none of the actual work for umounting has been done, yet umount exits.
That'd explain the busy inodes message we always get and the random crashes
in one or the other ocfs2 thread.

Can you try this patch and see if it reproduces? Running with this, I can't
get it to crash any more :) We did dismount from clear_inode for historical
reasons having to do with some data structures we used to alloc on the
inodes. We no longer have those issues due to our sane handling of inode
private nowadays, so I think it's fine to actually umount from our put_super
method like other file systems :)
	--Mark

--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com

Index: super.c
===================================================================
--- super.c	(revision 1383)
+++ super.c	(working copy)
@@ -712,7 +712,8 @@ static void ocfs_put_super (struct super
 	LOG_ENTRY_ARGS ("(0x%p)\n", sb);
 
 	ocfs_sync_blockdev(sb);
-	LOG_TRACE_STR ("put super... do nothing!  DONE!!!!");
+	ocfs_dismount_volume (sb);
+
 	LOG_EXIT ();
 
 	LOG_CLEAR_CONTEXT();
Index: inode.c
===================================================================
--- inode.c	(revision 1383)
+++ inode.c	(working copy)
@@ -757,15 +757,6 @@ void ocfs_clear_inode (struct inode *ino
 	ocfs_extent_map_destroy (&OCFS_I(inode)->ip_ext_map);
 	ocfs_extent_map_init (&OCFS_I(inode)->ip_ext_map);
 
-	if (inode == osb->root_inode) {
-		LOG_TRACE_STR("this is the root inode, doing cleanup now!");
-		ocfs_sync_blockdev(inode->i_sb);
-		LOG_TRACE_STR ("syncing past root inode");
-		LOG_TRACE_STR ("calling dismount");
-		ocfs_dismount_volume (inode->i_sb);
-		goto bail;
-	}
-
 	down(&recovery_list_sem);
 	list_del(&OCFS_I(inode)->ip_recovery_list);
 	up(&recovery_list_sem);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
@ 2004-08-26  1:43 Ling, Xiaofeng
  2004-08-26  2:46 ` Mark Fasheh
  0 siblings, 1 reply; 8+ messages in thread
From: Ling, Xiaofeng @ 2004-08-26  1:43 UTC (permalink / raw)
  To: ocfs2-devel

Ok. seems put ocfs_dismount_volume in put_super can resolve half of the =
problem.
But there still have busy inode when rmdir and umount at once. see =
message:
"VFS:Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice =
day..."
rmmod and re-insert will cause segmentation fault due the slab problem.
for mkdir, it may also happen, but not every time.



>-----Original Message-----
>From: Mark Fasheh [mailto:mark.fasheh@oracle.com]=20
>Sent: 2004=C4=EA8=D4=C226=C8=D5 1:40
>To: Ling, Xiaofeng
>Cc: ocfs2-devel@oss.oracle.com
>Subject: Re: [Ocfs2-devel] [PATCH] fix big 116 -- umount cause=20
>crash after some operation
>
>On Wed, Aug 25, 2004 at 09:04:39AM +0800, Ling, Xiaofeng wrote:
>> So what we see is not the same.=20
>> The bug 116 is easy to reproduce, just do a command line as
>> root#mkdir /ocfs/aaa;umount /ocfs
>> follow the umount to mkdir tightly.
>Ok, I think I see what's going on here.
>
>if I do this:
>
>mkdir /ocfs2/foobar/baz; umount /ocfs2
>
>assuming that foobar was created a while ago so the commit=20
>thread has no
>work currently then the umount goes through fine, we wait on the commit
>thread and he flushes everything.
>
>however,
>
>mkdir /ocfs/foo; umount /ocfs2
>
>will crash things almost immediately. What's the only difference there?
>Basically that the root inode is in the commit thread on the=20
>second case.
>
>What I believe is happening is that umount succeeds without calling
>clear_inode on the root inode because we still have a=20
>reference on it, so
>none of the actual work for umounting has been done, yet umount exits.
>That'd explain the busy inodes message we always get and the=20
>random crashes
>in one or the other ocfs2 thread.
>
>Can you try this patch and see if it reproduces? Running with=20
>this, I can't
>get it to crash any more :) We did dismount from clear_inode=20
>for historical
>reasons having to do with some data structures we used to alloc on the
>inodes. We no longer have those issues due to our sane=20
>handling of inode
>private nowadays, so I think it's fine to actually umount from=20
>our put_super
>method like other file systems :)
>	--Mark
>
>--
>Mark Fasheh
>Software Developer, Oracle Corp
>mark.fasheh@oracle.com
>
>Index: super.c
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>--- super.c	(revision 1383)
>+++ super.c	(working copy)
>@@ -712,7 +712,8 @@ static void ocfs_put_super (struct super
> 	LOG_ENTRY_ARGS ("(0x%p)\n", sb);
>=20
> 	ocfs_sync_blockdev(sb);
>-	LOG_TRACE_STR ("put super... do nothing!  DONE!!!!");
>+	ocfs_dismount_volume (sb);
>+
> 	LOG_EXIT ();
>=20
> 	LOG_CLEAR_CONTEXT();
>Index: inode.c
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>--- inode.c	(revision 1383)
>+++ inode.c	(working copy)
>@@ -757,15 +757,6 @@ void ocfs_clear_inode (struct inode *ino
> 	ocfs_extent_map_destroy (&OCFS_I(inode)->ip_ext_map);
> 	ocfs_extent_map_init (&OCFS_I(inode)->ip_ext_map);
>=20
>-	if (inode =3D=3D osb->root_inode) {
>-		LOG_TRACE_STR("this is the root inode, doing=20
>cleanup now!");
>-		ocfs_sync_blockdev(inode->i_sb);
>-		LOG_TRACE_STR ("syncing past root inode");
>-		LOG_TRACE_STR ("calling dismount");
>-		ocfs_dismount_volume (inode->i_sb);
>-		goto bail;
>-	}
>-
> 	down(&recovery_list_sem);
> 	list_del(&OCFS_I(inode)->ip_recovery_list);
> 	up(&recovery_list_sem);
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
  2004-08-26  1:43 Ling, Xiaofeng
@ 2004-08-26  2:46 ` Mark Fasheh
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Fasheh @ 2004-08-26  2:46 UTC (permalink / raw)
  To: ocfs2-devel

On Thu, Aug 26, 2004 at 02:43:25PM +0800, Ling, Xiaofeng wrote:
> Ok. seems put ocfs_dismount_volume in put_super can resolve half of the
> problem.
Excellent.

> But there still have busy inode when rmdir and umount at once. see message:
> "VFS:Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day..."
> rmmod and re-insert will cause segmentation fault due the slab problem.
> for mkdir, it may also happen, but not every time.
Ok, so now it sounds like we're leaking inodes on delete. Just to be sure,
you don't still see our kernel threads alive after a quick rmdir;umount?
	--Mark
 
> >-----Original Message-----
> >From: Mark Fasheh [mailto:mark.fasheh@oracle.com] 
> >Sent: 2004??8??26?? 1:40
> >To: Ling, Xiaofeng
> >Cc: ocfs2-devel@oss.oracle.com
> >Subject: Re: [Ocfs2-devel] [PATCH] fix big 116 -- umount cause 
> >crash after some operation
> >
> >On Wed, Aug 25, 2004 at 09:04:39AM +0800, Ling, Xiaofeng wrote:
> >> So what we see is not the same. 
> >> The bug 116 is easy to reproduce, just do a command line as
> >> root#mkdir /ocfs/aaa;umount /ocfs
> >> follow the umount to mkdir tightly.
> >Ok, I think I see what's going on here.
> >
> >if I do this:
> >
> >mkdir /ocfs2/foobar/baz; umount /ocfs2
> >
> >assuming that foobar was created a while ago so the commit 
> >thread has no
> >work currently then the umount goes through fine, we wait on the commit
> >thread and he flushes everything.
> >
> >however,
> >
> >mkdir /ocfs/foo; umount /ocfs2
> >
> >will crash things almost immediately. What's the only difference there?
> >Basically that the root inode is in the commit thread on the 
> >second case.
> >
> >What I believe is happening is that umount succeeds without calling
> >clear_inode on the root inode because we still have a 
> >reference on it, so
> >none of the actual work for umounting has been done, yet umount exits.
> >That'd explain the busy inodes message we always get and the 
> >random crashes
> >in one or the other ocfs2 thread.
> >
> >Can you try this patch and see if it reproduces? Running with 
> >this, I can't
> >get it to crash any more :) We did dismount from clear_inode 
> >for historical
> >reasons having to do with some data structures we used to alloc on the
> >inodes. We no longer have those issues due to our sane 
> >handling of inode
> >private nowadays, so I think it's fine to actually umount from 
> >our put_super
> >method like other file systems :)
> >	--Mark
> >
> >--
> >Mark Fasheh
> >Software Developer, Oracle Corp
> >mark.fasheh@oracle.com
> >
> >Index: super.c
> >===================================================================
> >--- super.c	(revision 1383)
> >+++ super.c	(working copy)
> >@@ -712,7 +712,8 @@ static void ocfs_put_super (struct super
> > 	LOG_ENTRY_ARGS ("(0x%p)\n", sb);
> > 
> > 	ocfs_sync_blockdev(sb);
> >-	LOG_TRACE_STR ("put super... do nothing!  DONE!!!!");
> >+	ocfs_dismount_volume (sb);
> >+
> > 	LOG_EXIT ();
> > 
> > 	LOG_CLEAR_CONTEXT();
> >Index: inode.c
> >===================================================================
> >--- inode.c	(revision 1383)
> >+++ inode.c	(working copy)
> >@@ -757,15 +757,6 @@ void ocfs_clear_inode (struct inode *ino
> > 	ocfs_extent_map_destroy (&OCFS_I(inode)->ip_ext_map);
> > 	ocfs_extent_map_init (&OCFS_I(inode)->ip_ext_map);
> > 
> >-	if (inode == osb->root_inode) {
> >-		LOG_TRACE_STR("this is the root inode, doing 
> >cleanup now!");
> >-		ocfs_sync_blockdev(inode->i_sb);
> >-		LOG_TRACE_STR ("syncing past root inode");
> >-		LOG_TRACE_STR ("calling dismount");
> >-		ocfs_dismount_volume (inode->i_sb);
> >-		goto bail;
> >-	}
> >-
> > 	down(&recovery_list_sem);
> > 	list_del(&OCFS_I(inode)->ip_recovery_list);
> > 	up(&recovery_list_sem);
> >
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-08-26  2:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-24 20:04 [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation Ling, Xiaofeng
2004-08-25 12:39 ` Mark Fasheh
  -- strict thread matches above, loose matches on Subject: below --
2004-08-26  1:43 Ling, Xiaofeng
2004-08-26  2:46 ` Mark Fasheh
2004-08-23 22:42 Ling, Xiaofeng
2004-08-24 13:07 ` Mark Fasheh
2004-08-23  3:05 Ling, Xiaofeng
2004-08-23 14:16 ` Mark Fasheh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.