From: Mark Fasheh <mark.fasheh@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation
Date: Tue Aug 24 13:07:12 2004 [thread overview]
Message-ID: <20040824180708.GA1094@ca-server1.us.oracle.com> (raw)
In-Reply-To: <3ACA40606221794F80A5670F0AF15F8405476E8E@pdsmsx403>
On Tue, Aug 24, 2004 at 11:42:44AM +0800, Ling, Xiaofeng wrote:
> The current process is
> ocfs_clear_inode->
> ocfs_dismount_volumn->
> ocfs_journal_shutdown->
> wake_up(&osb->flush_event)
> although it seems this can release the lock. but the bug still exist.
> The bug is triggered in submit_bh,
> in kernel, the process is
> do_umount->
> fsync_super->
> ocfs_sync_fs, sync_blockdev, sync_inodes_sb->
> ocfs_clear_inode
> So I guess the bug is triggered in sync_blockdev before ocfs_clear_inode then put that in ocfs_clear_inode is too later.
> I've tried to move ocfs_dismount_volumn or ocfs_journal_shutdown to ocfs_sync_fs, but that will cause other problems.
> and if just put ocfs_commit_cache in it, it's ok.
> If you don't want an extra call to ocfs_commit_cache, we need to find a better place to put ocfs_dismount_volumn.
Do you have a kernel stack trace? I want to make sure we're talking about
the same bug (s) here. The 1st one I dealt with was only that commit thread
was trying to release the locks with a signal pending which caused the code
in dlm.c to freak out. I removed signal handling from it altogether and just
use our ocfs_wait function instead. This seemed to alleviate that problem.
Issue #2 I see during umount is that occasionally the ocfs2 nm thread will
crash in __make_request:
------------[ cut here ]------------
kernel BUG at ll_rw_blk.c:1014!
invalid operand: 0000
ocfs2 nfs lockd sunrpc parport_pc lp parport netconsole autofs tg3 floppy sg
loo
p lvm-mod keybdev mousedev hid input usb-uhci usbcore ext3 jbd qla2300
aic7xxx
CPU: 6
EIP: 0060:[<c01ccad1>] Not tainted
EFLAGS: 00010246
EIP is at __make_request [kernel] 0x81 (2.4.21-15.ELsmp/i686)
eax: 00000800 ebx: 00000000 ecx: 00000b40 edx: 00000008
esi: f3cbceec edi: f6285618 ebp: 00000b40 esp: c6f6fdc4
ds: 0068 es: 0068 ss: 0068
Process ocfs2nm-0 (pid: 9164, stackpage=c6f6f000)
Stack: f6285600 c0435180 00000006 c53ee000 c6f6fe18 c0123274 c0436680
c6f6e000
c53ee000 000000ff 00000800 f6285640 c6f6e000 c0436680 00000000
00000008
00000b40 00000006 f3cbceec 00000008 013fe5b9 00000b40 c01cd22a
f6285618
Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xc6f6fdd8)
[<c01cd22a>] generic_make_request [kernel] 0xea (0xc6f6fe1c)
[<c01cd2c9>] submit_bh_rsector [kernel] 0x49 (0xc6f6fe44)
[<f8b05bbd>] ocfs_read_bhs [ocfs2] 0x641 (0xc6f6fe60)
[<c013334b>] del_timer_sync [kernel] 0x1b (0xc6f6fe94)
[<f8b1c418>] ocfs_volume_thread [ocfs2] 0x25c (0xc6f6fec0)
[<c0122450>] load_balance [kernel] 0x30 (0xc6f6ff0c)
[<c013dd94>] clear_page_tables [kernel] 0x64 (0xc6f6ff24)
[<c0123274>] schedule [kernel] 0x2f4 (0xc6f6ff4c)
[<c012c4c1>] exit_notify [kernel] 0x121 (0xc6f6ff70)
[<c012c9c0>] do_exit [kernel] 0x370 (0xc6f6ff90)
[<f8b1c1bc>] ocfs_volume_thread [ocfs2] 0x0 (0xc6f6ffe0)
[<c010958d>] kernel_thread_helper [kernel] 0x5 (0xc6f6fff0)
Code: 0f 0b f6 03 7c d4 2b c0 89 74 24 08 89 5c 24 04 89 3c 24 e8
I'm still trying to track this one down. If we're still not on the same
page, some more explanation would be helpfull :) Thanks,
--Mark
>
>
> >-----Original Message-----
> >From: Mark Fasheh [mailto:mark.fasheh@oracle.com]
> >Sent: 2004??8??24?? 3:17
> >To: Ling, Xiaofeng
> >Cc: ocfs2-devel@oss.oracle.com
> >Subject: Re: [Ocfs2-devel] [PATCH] fix big 116 -- umount cause
> >crash after some operation
> >
> >On Mon, Aug 23, 2004 at 04:05:57PM +0800, Ling, Xiaofeng wrote:
> >> before umount, all the lock shall be released first.
> >> this patch can resolve the problem
> >This isn't the way we want to handle this. The way it's
> >supposed to work is
> >that the umount process sends a signal to the commit thread to
> >shutdown and
> >then waits on it's last set of checkpoints / releases. I'd
> >much rather fix
> >what's broken than patch over it :)
> >
> >I'm actually looking at this bug right now and I think it's due to our
> >signal handling code...
> > --Mark
> >
> >>
> >> ------------------------------------------------
> >> Index: super.c
> >> ===================================================================
> >> --- super.c (revision 1370)
> >> +++ super.c (working copy)
> >> @@ -224,6 +224,7 @@
> >> tid_t target;
> >>
> >> sb->s_dirt = 0;
> >> + ocfs_commit_cache(OCFS2_SB(sb));
> >> target =
> >log_start_commit(OCFS2_SB(sb)->journal->k_journal, NULL);
> >> log_wait_commit(OCFS2_SB(sb)->journal->k_journal, target);
> >> return 0;
> >> @@ -234,6 +235,7 @@
> >> tid_t target;
> >>
> >> sb->s_dirt = 0;
> >> + ocfs_commit_cache(OCFS2_SB(sb));
> >> if
> >(journal_start_commit(OCFS2_SB(sb)->journal->k_journal, &target))
> >> {
> >> if (wait)
> >> log_wait_commit(OCFS2_SB(sb)->journal->k_journal,
> >> Index: journal.c
> >> ===================================================================
> >> --- journal.c (revision 1370)
> >> +++ journal.c (working copy)
> >> @@ -61,7 +61,7 @@
> >> struct inode *inode);
> >> static int ocfs_recover_node(struct _ocfs_super *osb, int node_num);
> >> static int __ocfs_recovery_thread(void *arg);
> >> -static int ocfs_commit_cache (ocfs_super * osb);
> >> +int ocfs_commit_cache (ocfs_super * osb);
> >> static int ocfs_wait_on_mount(ocfs_super *osb);
> >> static void ocfs_handle_move_locks(ocfs_journal *journal,
> >> ocfs_journal_handle *handle);
> >> @@ -149,7 +149,7 @@
> >> * This is in journal.c for lack of a better place.
> >> *
> >> */
> >> -static int ocfs_commit_cache(ocfs_super *osb)
> >> +int ocfs_commit_cache(ocfs_super *osb)
> >> {
> >> int status = 0, tmpstat;
> >> ocfs_journal * journal = NULL;
> >>
> >>
> >> -------------------
> >> Ling Xiaofeng(Daniel)
> >>
> >> Intel China Software Lab.
> >> iNet: 8-752-1243
> >> 8621-52574545-1243(O)
> >>
> >> xfling@users.sourceforge.net
> >> Opinions are my own and don't represent those of my employer
> >> _______________________________________________
> >> Ocfs2-devel mailing list
> >> Ocfs2-devel@oss.oracle.com
> >> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >--
> >Mark Fasheh
> >Software Developer, Oracle Corp
> >mark.fasheh@oracle.com
> >
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com
next prev parent reply other threads:[~2004-08-24 13:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-23 22:42 [Ocfs2-devel] [PATCH] fix big 116 -- umount cause crash after some operation Ling, Xiaofeng
2004-08-24 13:07 ` Mark Fasheh [this message]
-- strict thread matches above, loose matches on Subject: below --
2004-08-26 1:43 Ling, Xiaofeng
2004-08-26 2:46 ` Mark Fasheh
2004-08-24 20:04 Ling, Xiaofeng
2004-08-25 12:39 ` Mark Fasheh
2004-08-23 3:05 Ling, Xiaofeng
2004-08-23 14:16 ` Mark Fasheh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040824180708.GA1094@ca-server1.us.oracle.com \
--to=mark.fasheh@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.