sget() misuse in nilfs

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* sget() misuse in nilfs
@ 2009-05-03 22:51 Al Viro
  2009-05-04 17:11 ` Ryusuke Konishi
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2009-05-03 22:51 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-kernel, linux-fsdevel

	OK, I give up; what _is_ get_sb/remount code supposed to implement?
Not to mention the reliability of down_read_trylock() in there (what happens
if somebody tries to e.g. remount fs we are looking at the time?  That's
right, s_umount held exclusive, down_read_trulock() failing, fs instance
skipped during the search), what is all that stuff trying to achieve?

	What protects MS_RDONLY in sd->s_flags during these searches?
What makes parse_options() in remount straight into sb (with reversal
if we'd done something bad) safe?  Do we ever reassign snapshot_cno
other than on sb creation and remounting between r/o and r/w?  Is there
any reason why we free sbi early (== in put_super()) and not after
kill_block_super() in ->kill_sb() of your own?  That alone would make
sbi stay with superblock for as long as it could be found by any
means, killing the locking mess in your test callbacks.  Can SNAPSHOT
even be there unless you have MS_RDONLY?

	And what are the rules for exclusion in case of r/w mounts?
What, do we get -EBUSY on attempt to mount r/w something that is
already mounted r/w (instead of simply sharing superblock, as other
filesystems would do)?  Or am I misreading that
        } else if (!(s->s_flags & MS_RDONLY)) {
                err = -EBUSY;
        }
in there?

						Very confused Al...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sget() misuse in nilfs
  2009-05-03 22:51 sget() misuse in nilfs Al Viro
@ 2009-05-04 17:11 ` Ryusuke Konishi
  2009-05-05  8:18   ` Al Viro
  0 siblings, 1 reply; 7+ messages in thread
From: Ryusuke Konishi @ 2009-05-04 17:11 UTC (permalink / raw)
  To: viro; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel

Hi,
On Sun, 3 May 2009 23:51:36 +0100, Al Viro wrote:
> OK, I give up; what _is_ get_sb/remount code supposed to implement?

Oh, these functions lack spec comments.

I first explain some specs. (I think part of them should be added on
the code, later)

The nilfs_get_sb() allocates a new super_block struct (sb) or assigns
an existing sb to the mount point through sget().  For newly allocated
sb it calls nilfs_fill_super() for initialization.

The following things are supposed here:

 1) Every rw-mount on a device shares the same sb (as usual).

 2) Every sb of snapshot is independent with that of rw-mount or other
    snapshots if their checkpoint numbers differ.  On the other hand,
    two or more snapshots having the same checkpoint number share a sb
    wherever possible.

 3) Snapshots are mountable concurrently with a rw-mount, but a
    ro-mount is not so because the ro-mount cannot follow changes
    brought by the rw-mount.

 4) Every sb on a device shares the same nilfs struct
    (i.e. the_nilfs), which stores their common information.

    The existing nilfs struct is acquired from an existing sb.
    Otherwise, a new nilfs struct is allocated through alloc_nilfs()
    function.  The lifetime of the nilfs struct is managed with a
    reference counter.

The nilfs_remount() function follows the change by given mount
options.  This does not replace sb, so it involves the following
actions and confirmation:

 a) A segment constructor (i.e. log writer of nilfs) is invoked on
    ro->rw remount, and it is shut down for ro->rw remount.

 b) ro->rw remount is possible only if there is no rw-mount on the
    device and the checkpoint number of the ro-mount is latest.

 c) Remounting snapshot to different checkpoints or rw-mount is not
    allowed.

> Not to mention the reliability of down_read_trylock() in there (what
> happens if somebody tries to e.g. remount fs we are looking at the
> time?  That's right, s_umount held exclusive, down_read_trulock()
> failing, fs instance skipped during the search), what is all that
> stuff trying to achieve?

It tries to find out the existing snapshot instance having a
checkpoint number which equals with the specified cp number (2).

Yeah, it may fail in the situation.  I allow the miss at present
because it's harmless and rare though it wastes memory due to unwanted
duplication of the snapshot instance.

> What protects MS_RDONLY in sd->s_flags during these searches?

sd->s_flags is a local variable of the call sites of sget(), but, ugh,
s->s_flags is not protected.  It may have problem...

BTW, MS_RDONLY in s->s_flags is also referred to in get_sb_bdev() and
do_remount_sb().  How is the protection achieved for these functions?

For example, ext3_remount() or ext3_handle_error() manipulates this
flag, and get_sb_bdev() or do_remount_sb() could see it for other
existing fs-instances.

> What makes parse_options() in remount straight into sb (with reversal
> if we'd done something bad) safe?

Well, it's not protected except lock_kernel() in the call sites.
I should add an argument to the parse_options() in order to forbid
destructive changes like ext3/4 does.

> Do we ever reassign snapshot_cno
> other than on sb creation and remounting between r/o and r/w?

We don't allow reassigning the snapshot_cno.
But, ugh, nilfs_remount() is rewriting it temporarily. It seems bad.
I'll fix it.

> Is there any reason why we free sbi early (== in put_super()) and
> not after kill_block_super() in ->kill_sb() of your own?

No, just I didn't think we have to delay it until ->kill_sb() because
bdev->bd_mount_sem and s->s_umount are protecting ->s_fs_info until
removing the sb from the type->fs_supers list and the super_blocks
list.

> That alone would make sbi stay with superblock for as long as it
> could be found by any means, killing the locking mess in your test
> callbacks.

If the advice turned out to be required or help for simplification,
I'll take it in.  I need some time to think about it.

> Can SNAPSHOT even be there unless you have MS_RDONLY?

Yes, it can.  Nilfs snapshots can exist concurrently with rw-mount.

> And what are the rules for exclusion in case of r/w mounts?

(answered in the first explanation)

> What, do we get -EBUSY on attempt to mount r/w something that is
> already mounted r/w (instead of simply sharing superblock, as other
> filesystems would do)?
>
> Or am I misreading that
>         } else if (!(s->s_flags & MS_RDONLY)) {
>                 err = -EBUSY;
>         }
> in there?

Hmm, this looks strange to me, too.  Nilfs is sharing a r/w-mount as
described in (1).  I'll confirm the -EBUSY case.

> 						Very confused Al...

Sorry.  I recongize these are rather confusing.
The use of sget() and the test routines should be reduced to a minimum
if feasible.

Anyway, thanks for many questions and comments.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sget() misuse in nilfs
  2009-05-04 17:11 ` Ryusuke Konishi
@ 2009-05-05  8:18   ` Al Viro
  2009-05-05 15:37     ` Ryusuke Konishi
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2009-05-05  8:18 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-kernel, linux-fsdevel

On Tue, May 05, 2009 at 02:11:29AM +0900, Ryusuke Konishi wrote:
> Hi,
> On Sun, 3 May 2009 23:51:36 +0100, Al Viro wrote:
> > OK, I give up; what _is_ get_sb/remount code supposed to implement?
> 
> Oh, these functions lack spec comments.
> 
> I first explain some specs. (I think part of them should be added on
> the code, later)
> 
> The nilfs_get_sb() allocates a new super_block struct (sb) or assigns
> an existing sb to the mount point through sget().  For newly allocated
> sb it calls nilfs_fill_super() for initialization.
> 
> The following things are supposed here:

>  1) Every rw-mount on a device shares the same sb (as usual).

OK.  That's the first kind of sb; no MS_RDONLY, no SNAPSHOT, snapshot_cno is 0.

>  2) Every sb of snapshot is independent with that of rw-mount or other
>     snapshots if their checkpoint numbers differ.  On the other hand,
>     two or more snapshots having the same checkpoint number share a sb
>     wherever possible.

Umm...  That's what, MS_RDONLY, SNAPSHOT, snapshot_cno > 0?

>  3) Snapshots are mountable concurrently with a rw-mount, but a
>     ro-mount is not so because the ro-mount cannot follow changes
>     brought by the rw-mount.

And ro-mount would be MS_RDONLY, no SNAPSHOT, snapshot_cno equal to the
nilfs_last_cno()?
 
>  b) ro->rw remount is possible only if there is no rw-mount on the
>     device and the checkpoint number of the ro-mount is latest.

Er...  How could there be an rw-mount while we have ro one?  Your
(3) above would seem to prohibit that situation...

>  c) Remounting snapshot to different checkpoints or rw-mount is not
>     allowed.

Where is the second part checked in the current code?
 
> > Can SNAPSHOT even be there unless you have MS_RDONLY?
> 
> Yes, it can.  Nilfs snapshots can exist concurrently with rw-mount.

On the same superblock, that is...  And AFAICS the answer's "no, it can't"
(we can have rw superblock and snapshot superblock at the same time, but
those will be different instances of struct superblock).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sget() misuse in nilfs
  2009-05-05  8:18   ` Al Viro
@ 2009-05-05 15:37     ` Ryusuke Konishi
  2009-05-05 16:37       ` Al Viro
  0 siblings, 1 reply; 7+ messages in thread
From: Ryusuke Konishi @ 2009-05-05 15:37 UTC (permalink / raw)
  To: viro; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel

On Tue, 5 May 2009 09:18:11 +0100, Al Viro wrote:
> On Tue, May 05, 2009 at 02:11:29AM +0900, Ryusuke Konishi wrote:
> > Hi,
> > On Sun, 3 May 2009 23:51:36 +0100, Al Viro wrote:
> > > OK, I give up; what _is_ get_sb/remount code supposed to implement?
> > 
> > Oh, these functions lack spec comments.
> > 
> > I first explain some specs. (I think part of them should be added on
> > the code, later)
> > 
> > The nilfs_get_sb() allocates a new super_block struct (sb) or assigns
> > an existing sb to the mount point through sget().  For newly allocated
> > sb it calls nilfs_fill_super() for initialization.
> > 
> > The following things are supposed here:
> 
> >  1) Every rw-mount on a device shares the same sb (as usual).
> 
> OK.  That's the first kind of sb; no MS_RDONLY, no SNAPSHOT, snapshot_cno is 0.
> 
> >  2) Every sb of snapshot is independent with that of rw-mount or other
> >     snapshots if their checkpoint numbers differ.  On the other hand,
> >     two or more snapshots having the same checkpoint number share a sb
> >     wherever possible.
> 
> Umm...  That's what, MS_RDONLY, SNAPSHOT, snapshot_cno > 0?

Yes, exactly.
 
> >  3) Snapshots are mountable concurrently with a rw-mount, but a
> >     ro-mount is not so because the ro-mount cannot follow changes
> >     brought by the rw-mount.
> 
> And ro-mount would be MS_RDONLY, no SNAPSHOT, snapshot_cno equal to the
> nilfs_last_cno()?

Yes.
  
> >  b) ro->rw remount is possible only if there is no rw-mount on the
> >     device and the checkpoint number of the ro-mount is latest.
> 
> Er...  How could there be an rw-mount while we have ro one?  Your
> (3) above would seem to prohibit that situation...

Oh, meaning of the (b) was ambiguous.  How about the following one?

 b) Remounting an ro-mount to read-only is possible only if the
    checkpoint number of the target ro-mount is latest and there is no
    existent rw-mount.

> >     device and the checkpoint number of the ro-mount is latest.
 
> >  c) Remounting snapshot to different checkpoints or rw-mount is not
> >     allowed.
> 
> Where is the second part checked in the current code?

Ah, the second part was wrong.  The snapshot with latest checkpoint
number can be remounted into an rw-mount.  So it should be:

 c) Remounting a snapshot to a different checkpoint is not allowed.
    Remounting a snapshot to an rw-mount is possible only if the
    target snapshot equals to the latest checkpoint.
  
> > > Can SNAPSHOT even be there unless you have MS_RDONLY?
> > 
> > Yes, it can.  Nilfs snapshots can exist concurrently with rw-mount.
> 
> On the same superblock, that is...  And AFAICS the answer's "no, it can't"
> (we can have rw superblock and snapshot superblock at the same time, but
> those will be different instances of struct superblock).

You are right. It's my misunderstanding.

You meant SNAPSHOT for the NILFS_MOUNT_SNAPSHOT flag on
sb->s_mount_opt (it does appear as SNAPSHOT on the code).  So, the
answer is "no, it can't".

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sget() misuse in nilfs
  2009-05-05 15:37     ` Ryusuke Konishi
@ 2009-05-05 16:37       ` Al Viro
  2009-05-06  6:28         ` Ryusuke Konishi
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2009-05-05 16:37 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel

On Wed, May 06, 2009 at 12:37:29AM +0900, Ryusuke Konishi wrote:
> Oh, meaning of the (b) was ambiguous.  How about the following one?
> 
>  b) Remounting an ro-mount to read-only is possible only if the
>     checkpoint number of the target ro-mount is latest and there is no
>     existent rw-mount.
> 
>  c) Remounting a snapshot to a different checkpoint is not allowed.
>     Remounting a snapshot to an rw-mount is possible only if the
>     target snapshot equals to the latest checkpoint.

That's really rather messy...  Let's see if I've got it right:

* r/w -> r/w.  Allowed.
* r/w -> r/o.  Allowed.
* r/w -> snapshot.  Not allowed.
* snapshot -> r/w.  Allowed if it's the latest one and no r/w is there.
* snapshot -> r/o.  It remains a snapshot, but says it has succeeded.
* snapshot -> snapshot.  Only if it's the same.
* r/o -> r/w.  Allowed [1]
* r/o -> r/o.  Allowed.
* r/o -> snapshot.  Allowed only if the snapshot number is the latest.

r/w can't coexist with r/o, but can coexist with any snapshots.  Can't be
remounted to a snapshot directly, but can go through r/w->r/o->latest snapshot
in two mount -o remount.

"r/o" in the above means "read-only, SNAPSHOT flag not set".

What happens if you mount the thing r/w, remount it r/o and then try to
mount the latest snapshot?  Will that give two superblocks or will it
reuse the r/o mount?

OTOH, what will happen if you take r/w mount, mount the latest snapshot and
then remount the r/w one to r/o?

[1] there couldn't have been new r/w mount while r/o one existed, snapshot
number couldn't have changed and the only possible transition *into* r/o is
from r/w, so another r/w superblock couldn't have survived since before our
superblock has become r/o.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sget() misuse in nilfs
  2009-05-05 16:37       ` Al Viro
@ 2009-05-06  6:28         ` Ryusuke Konishi
  2009-05-06 16:09           ` Ryusuke Konishi
  0 siblings, 1 reply; 7+ messages in thread
From: Ryusuke Konishi @ 2009-05-06  6:28 UTC (permalink / raw)
  To: viro; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel

On Tue, 5 May 2009 17:37:37 +0100, Al Viro wrote:
> On Wed, May 06, 2009 at 12:37:29AM +0900, Ryusuke Konishi wrote:
> > Oh, meaning of the (b) was ambiguous.  How about the following one?
> > 
> >  b) Remounting an ro-mount to read-only is possible only if the
> >     checkpoint number of the target ro-mount is latest and there is no
> >     existent rw-mount.
> > 
> >  c) Remounting a snapshot to a different checkpoint is not allowed.
> >     Remounting a snapshot to an rw-mount is possible only if the
> >     target snapshot equals to the latest checkpoint.
> 
> That's really rather messy...  Let's see if I've got it right:
> 
> * r/w -> r/w.  Allowed.
> * r/w -> r/o.  Allowed.
> * r/w -> snapshot.  Not allowed.
> * snapshot -> r/w.  Allowed if it's the latest one and no r/w is there.
> * snapshot -> r/o.  It remains a snapshot, but says it has succeeded.

Ah, this transition was not assumed.  It needs some fix.

> * snapshot -> snapshot.  Only if it's the same.
> * r/o -> r/w.  Allowed [1]
> * r/o -> r/o.  Allowed.
> * r/o -> snapshot.  Allowed only if the snapshot number is the latest.

Look correct.

> r/w can't coexist with r/o, but can coexist with any snapshots.
> Can't be remounted to a snapshot directly, but can go through
> r/w->r/o->latest snapshot in two mount -o remount.

Hmm, right. It looks half-baked.

The transition "r/w -> latest snapshot" should be allowed to ensure
consistency.

> "r/o" in the above means "read-only, SNAPSHOT flag not set".
> 
> What happens if you mount the thing r/w, remount it r/o and then try to
> mount the latest snapshot?  Will that give two superblocks or will it
> reuse the r/o mount?

It will reuse the r/o mount, which was originally r/w mount.
 
> OTOH, what will happen if you take r/w mount, mount the latest snapshot and
> then remount the r/w one to r/o?

In that case, the latest snapshot and the r/o-mount will coexist as
two different instances.

> [1] there couldn't have been new r/w mount while r/o one existed, snapshot
> number couldn't have changed and the only possible transition *into* r/o is
> from r/w, so another r/w superblock couldn't have survived since before our
> superblock has become r/o.

I'd rather simplify things.

If we treat read-only mount as the latest snapshot at the time (though
we didn't take this interpretation), the transitions can be reduced
to:

 * r/w -> r/w.  Allowed.
 * r/w -> snapshot.  Allowed if no checkpoint number was given (or the
                     latest checkpoint was specified)
 * snapshot -> r/w.  Allowed if it's the latest one and no r/w is there.
 * snapshot -> snapshot.  Only if it's the same.

Right?

But it still needs test_exclusive_mount().

The test_exclusive_mount() may be eliminable by adding rw-mount-exists
flag on the_nilfs struct.  I'll take some thinking.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sget() misuse in nilfs
  2009-05-06  6:28         ` Ryusuke Konishi
@ 2009-05-06 16:09           ` Ryusuke Konishi
  0 siblings, 0 replies; 7+ messages in thread
From: Ryusuke Konishi @ 2009-05-06 16:09 UTC (permalink / raw)
  To: viro; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel

Hi,
On Wed, 06 May 2009 15:28:59 +0900 (JST), Ryusuke Konishi wrote:
> If we treat read-only mount as the latest snapshot at the time (though
> we didn't take this interpretation), the transitions can be reduced
> to:
> 
>  * r/w -> r/w.  Allowed.
>  * r/w -> snapshot.  Allowed if no checkpoint number was given (or the
>                      latest checkpoint was specified)
>  * snapshot -> r/w.  Allowed if it's the latest one and no r/w is there.
>  * snapshot -> snapshot.  Only if it's the same.
> 
> Right?

Ah, I had forgotten garbage collection (GC).

GC can break checkpoints which are not marked as snapshot.  ro-mount
cannot coexist with rw-mount because GC works while an rw-mount is
there.

Sorry, the above interpretation was not easily realized.

> But it still needs test_exclusive_mount().
> 
> The test_exclusive_mount() may be eliminable by adding rw-mount-exists
> flag on the_nilfs struct.  I'll take some thinking.

The elimination of test_exclusive_mount() was possible by this method
if we can treat ro-mount as the latest checkpoint at the time.

I'd like to consider if a similiar elimination is possible in case
that ro-mount and rw-mount cannot coexist.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-05-06 16:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-03 22:51 sget() misuse in nilfs Al Viro
2009-05-04 17:11 ` Ryusuke Konishi
2009-05-05  8:18   ` Al Viro
2009-05-05 15:37     ` Ryusuke Konishi
2009-05-05 16:37       ` Al Viro
2009-05-06  6:28         ` Ryusuke Konishi
2009-05-06 16:09           ` Ryusuke Konishi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).