[Ocfs2-devel] Bug in OCFS2 umount code

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] Bug in OCFS2 umount code
@ 2008-10-24 21:57 Jan Kara
  2008-10-24 22:40 ` Joel Becker
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2008-10-24 21:57 UTC (permalink / raw)
  To: ocfs2-devel

  Hello,

  while playing with quota support I found two bugs in OCFS2 mount/umount
code. The first problem is, that if mount fails, we call
ocfs2_dismount_volume(). That is fine but after fill_super() returns an
error, VFS calls sync_fs() and ocfs2_put_super() again - not a good idea.
We usually oops...
  Another problem is oops with the following backtrace:
#0  spin_bug (lock=0x61e9e530, msg=0x6031aa23 "bad magic")
    at include/linux/sched.h:1388
#1  0x00000000601ece1d in _raw_spin_lock (lock=0x61de1520)
    at lib/spinlock_debug.c:78
#2  0x0000000060256a91 in _spin_lock (lock=0x61de1520) at
kernel/spinlock.c:181
#3  0x0000000060122f92 in jbd2_journal_release_jbd_inode (
    journal=<value optimized out>, jinode=0x6103c920) at
fs/jbd2/journal.c:2247
#4  0x0000000060178399 in ocfs2_clear_inode (inode=0x6103c638)
    at fs/ocfs2/inode.c:1119
#5  0x00000000600866c3 in clear_inode (inode=0x6103c638) at fs/inode.c:269
#6  0x00000000600869ef in generic_drop_inode (inode=0x6103c638)
    at fs/inode.c:1100
#7  0x00000000601784c5 in ocfs2_drop_inode (inode=0x6103c638)
    at fs/ocfs2/inode.c:1141
#8  0x0000000060085bc1 in iput (inode=0x6103c638) at fs/inode.c:1138
#9  0x000000006018e5fd in ocfs2_release_system_inodes (osb=0x61f3e600)
    at fs/ocfs2/super.c:337
#10 0x000000006019050b in ocfs2_dismount_volume (sb=0x61e9eaf8, mnt_err=0)
    at fs/ocfs2/super.c:1599
#11 0x0000000060190a63 in ocfs2_put_super (sb=0x61e9eaf8)
    at fs/ocfs2/super.c:1334

  I guess the cause is that ocfs2_clear_inode() gets called after
the journal is freed. Well, jbd2_journal_release_jbd_inode() handles
this but only if you pass NULL instead of journal pointer in such case...
But OCFS2 obviously passes some already invalid pointer.
  I'd fix these bugs myself but I'll be travelling next few days and
probably won't get to it so I'm just reporting them.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] Bug in OCFS2 umount code
  2008-10-24 21:57 [Ocfs2-devel] Bug in OCFS2 umount code Jan Kara
@ 2008-10-24 22:40 ` Joel Becker
  2008-10-24 22:45   ` Joel Becker
  2008-10-24 22:48   ` Jan Kara
  0 siblings, 2 replies; 5+ messages in thread
From: Joel Becker @ 2008-10-24 22:40 UTC (permalink / raw)
  To: ocfs2-devel

On Fri, Oct 24, 2008 at 11:57:02PM +0200, Jan Kara wrote:
>   while playing with quota support I found two bugs in OCFS2 mount/umount
> code. The first problem is, that if mount fails, we call
> ocfs2_dismount_volume(). That is fine but after fill_super() returns an
> error, VFS calls sync_fs() and ocfs2_put_super() again - not a good idea.
> We usually oops...

	I was trying to figure out why this happens, given that we've
never seen it before and we test failed mounts all the time (no cluster
stack, bad mount argument, bad features...).
	The vfs only calls ->sync_fs() and ->put_super() if sb->s_root
exists.  So we cannot fail a mount after we've set sb->s_root.  If you
check ocfs2_fill_super(), you'll see that after we sb->s_root, we will
continue on to success (we return 'status', but status will be 0).
	After we've set s_root, we must complete our mount tasks in
fill_super().  If you want to fail after that, you need to complete the
mount tasks but return non-zero status.  Then the VFS code will follow
to ->put_super() like you say and ocfs2_dismount_volume() will only be
called once.
	So I don't see a bug in the code as I have it here in my tree.
I'm going to go check your quota patches to see if perhaps you jump to
read_super_error after s_root is set.  If so, we can just reorganize
that.  If not, we'll have to figure out what particular mount failure is
causing your problem, so we can see what's happening.

>   Another problem is oops with the following backtrace:
> #0  spin_bug (lock=0x61e9e530, msg=0x6031aa23 "bad magic")
>     at include/linux/sched.h:1388
> #1  0x00000000601ece1d in _raw_spin_lock (lock=0x61de1520)
>     at lib/spinlock_debug.c:78
> #2  0x0000000060256a91 in _spin_lock (lock=0x61de1520) at
> kernel/spinlock.c:181
> #3  0x0000000060122f92 in jbd2_journal_release_jbd_inode (
>     journal=<value optimized out>, jinode=0x6103c920) at
> fs/jbd2/journal.c:2247
> #4  0x0000000060178399 in ocfs2_clear_inode (inode=0x6103c638)
>     at fs/ocfs2/inode.c:1119
> #5  0x00000000600866c3 in clear_inode (inode=0x6103c638) at fs/inode.c:269
> #6  0x00000000600869ef in generic_drop_inode (inode=0x6103c638)
>     at fs/inode.c:1100
> #7  0x00000000601784c5 in ocfs2_drop_inode (inode=0x6103c638)
>     at fs/ocfs2/inode.c:1141
> #8  0x0000000060085bc1 in iput (inode=0x6103c638) at fs/inode.c:1138
> #9  0x000000006018e5fd in ocfs2_release_system_inodes (osb=0x61f3e600)
>     at fs/ocfs2/super.c:337
> #10 0x000000006019050b in ocfs2_dismount_volume (sb=0x61e9eaf8, mnt_err=0)
>     at fs/ocfs2/super.c:1599
> #11 0x0000000060190a63 in ocfs2_put_super (sb=0x61e9eaf8)
>     at fs/ocfs2/super.c:1334
> 
>   I guess the cause is that ocfs2_clear_inode() gets called after
> the journal is freed. Well, jbd2_journal_release_jbd_inode() handles
> this but only if you pass NULL instead of journal pointer in such case...
> But OCFS2 obviously passes some already invalid pointer.

	This is already known and the fix will be going upstream
shortly.

Joel

-- 

"Win95 file and print sharing are for relatively friendly nets."
	- Paul Leach, Microsoft

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] Bug in OCFS2 umount code
  2008-10-24 22:40 ` Joel Becker
@ 2008-10-24 22:45   ` Joel Becker
  2008-10-24 22:48   ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Joel Becker @ 2008-10-24 22:45 UTC (permalink / raw)
  To: ocfs2-devel

On Fri, Oct 24, 2008 at 03:40:25PM -0700, Joel Becker wrote:
> On Fri, Oct 24, 2008 at 11:57:02PM +0200, Jan Kara wrote:
> >   while playing with quota support I found two bugs in OCFS2 mount/umount
> > code. The first problem is, that if mount fails, we call
> > ocfs2_dismount_volume(). That is fine but after fill_super() returns an
> > error, VFS calls sync_fs() and ocfs2_put_super() again - not a good idea.
> > We usually oops...
<snip> 
> 	So I don't see a bug in the code as I have it here in my tree.
> I'm going to go check your quota patches to see if perhaps you jump to
> read_super_error after s_root is set.  If so, we can just reorganize
> that.  If not, we'll have to figure out what particular mount failure is
> causing your problem, so we can see what's happening.

	Yeah, you're bailing after setting s_root:

@@ -745,6 +899,13 @@ static int ocfs2_fill_super(struct super_block *sb,
void *data, int silent) 
 
        sb->s_root = root;
 
+       /* Initialize quotas before we start truncation log recovery */
+       status = ocfs2_enable_quotas(osb); 
+       if (status < 0) {
+               mlog_errno(status);
+               goto read_super_error;
+       }

	I don't see why you can't move that above the s_root line.  The
filesystem isn't accessible to userspace yet.  Then your error is safe
to jump to read_super_error.

Joel

-- 

Bram's Law:
	The easier a piece of software is to write, the worse it's
	implemented in practice.

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] Bug in OCFS2 umount code
  2008-10-24 22:40 ` Joel Becker
  2008-10-24 22:45   ` Joel Becker
@ 2008-10-24 22:48   ` Jan Kara
  2008-10-25  1:54     ` Joel Becker
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2008-10-24 22:48 UTC (permalink / raw)
  To: ocfs2-devel

On Fri 24-10-08 15:40:25, Joel Becker wrote:
> On Fri, Oct 24, 2008 at 11:57:02PM +0200, Jan Kara wrote:
> >   while playing with quota support I found two bugs in OCFS2 mount/umount
> > code. The first problem is, that if mount fails, we call
> > ocfs2_dismount_volume(). That is fine but after fill_super() returns an
> > error, VFS calls sync_fs() and ocfs2_put_super() again - not a good idea.
> > We usually oops...
> 
> 	I was trying to figure out why this happens, given that we've
> never seen it before and we test failed mounts all the time (no cluster
> stack, bad mount argument, bad features...).
> 	The vfs only calls ->sync_fs() and ->put_super() if sb->s_root
> exists.  So we cannot fail a mount after we've set sb->s_root.  If you
> check ocfs2_fill_super(), you'll see that after we sb->s_root, we will
> continue on to success (we return 'status', but status will be 0).
> 	After we've set s_root, we must complete our mount tasks in
> fill_super().  If you want to fail after that, you need to complete the
> mount tasks but return non-zero status.  Then the VFS code will follow
> to ->put_super() like you say and ocfs2_dismount_volume() will only be
> called once.
> 	So I don't see a bug in the code as I have it here in my tree.
> I'm going to go check your quota patches to see if perhaps you jump to
> read_super_error after s_root is set.  If so, we can just reorganize
> that.  If not, we'll have to figure out what particular mount failure is
> causing your problem, so we can see what's happening.
  Ah, OK. I've noticed that when quota failed to turn on for some reason.
So you are probably right, that I've added an error path which was not
there. Just when I was looking at the mount code I didn't notice that
I'm doing something different from the rest of the function :) Sorry for
the noise.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] Bug in OCFS2 umount code
  2008-10-24 22:48   ` Jan Kara
@ 2008-10-25  1:54     ` Joel Becker
  0 siblings, 0 replies; 5+ messages in thread
From: Joel Becker @ 2008-10-25  1:54 UTC (permalink / raw)
  To: ocfs2-devel

On Sat, Oct 25, 2008 at 12:48:39AM +0200, Jan Kara wrote:
> On Fri 24-10-08 15:40:25, Joel Becker wrote:
> > 	So I don't see a bug in the code as I have it here in my tree.
> > I'm going to go check your quota patches to see if perhaps you jump to
> > read_super_error after s_root is set.  If so, we can just reorganize
> > that.  If not, we'll have to figure out what particular mount failure is
> > causing your problem, so we can see what's happening.
>   Ah, OK. I've noticed that when quota failed to turn on for some reason.
> So you are probably right, that I've added an error path which was not
> there. Just when I was looking at the mount code I didn't notice that
> I'm doing something different from the rest of the function :) Sorry for
> the noise.

	It's not noise.  We didn't know until we checked it out :-)
Don't worry about bothering us.

Joel

-- 

"Here's a nickle -- get yourself a better X server."
	- Keith Packard

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-10-25  1:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-24 21:57 [Ocfs2-devel] Bug in OCFS2 umount code Jan Kara
2008-10-24 22:40 ` Joel Becker
2008-10-24 22:45   ` Joel Becker
2008-10-24 22:48   ` Jan Kara
2008-10-25  1:54     ` Joel Becker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.