* sget() misuse in nilfs
@ 2009-05-03 22:51 Al Viro
2009-05-04 17:11 ` Ryusuke Konishi
0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2009-05-03 22:51 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: linux-kernel, linux-fsdevel
OK, I give up; what _is_ get_sb/remount code supposed to implement?
Not to mention the reliability of down_read_trylock() in there (what happens
if somebody tries to e.g. remount fs we are looking at the time? That's
right, s_umount held exclusive, down_read_trulock() failing, fs instance
skipped during the search), what is all that stuff trying to achieve?
What protects MS_RDONLY in sd->s_flags during these searches?
What makes parse_options() in remount straight into sb (with reversal
if we'd done something bad) safe? Do we ever reassign snapshot_cno
other than on sb creation and remounting between r/o and r/w? Is there
any reason why we free sbi early (== in put_super()) and not after
kill_block_super() in ->kill_sb() of your own? That alone would make
sbi stay with superblock for as long as it could be found by any
means, killing the locking mess in your test callbacks. Can SNAPSHOT
even be there unless you have MS_RDONLY?
And what are the rules for exclusion in case of r/w mounts?
What, do we get -EBUSY on attempt to mount r/w something that is
already mounted r/w (instead of simply sharing superblock, as other
filesystems would do)? Or am I misreading that
} else if (!(s->s_flags & MS_RDONLY)) {
err = -EBUSY;
}
in there?
Very confused Al...
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: sget() misuse in nilfs 2009-05-03 22:51 sget() misuse in nilfs Al Viro @ 2009-05-04 17:11 ` Ryusuke Konishi 2009-05-05 8:18 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Ryusuke Konishi @ 2009-05-04 17:11 UTC (permalink / raw) To: viro; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel Hi, On Sun, 3 May 2009 23:51:36 +0100, Al Viro wrote: > OK, I give up; what _is_ get_sb/remount code supposed to implement? Oh, these functions lack spec comments. I first explain some specs. (I think part of them should be added on the code, later) The nilfs_get_sb() allocates a new super_block struct (sb) or assigns an existing sb to the mount point through sget(). For newly allocated sb it calls nilfs_fill_super() for initialization. The following things are supposed here: 1) Every rw-mount on a device shares the same sb (as usual). 2) Every sb of snapshot is independent with that of rw-mount or other snapshots if their checkpoint numbers differ. On the other hand, two or more snapshots having the same checkpoint number share a sb wherever possible. 3) Snapshots are mountable concurrently with a rw-mount, but a ro-mount is not so because the ro-mount cannot follow changes brought by the rw-mount. 4) Every sb on a device shares the same nilfs struct (i.e. the_nilfs), which stores their common information. The existing nilfs struct is acquired from an existing sb. Otherwise, a new nilfs struct is allocated through alloc_nilfs() function. The lifetime of the nilfs struct is managed with a reference counter. The nilfs_remount() function follows the change by given mount options. This does not replace sb, so it involves the following actions and confirmation: a) A segment constructor (i.e. log writer of nilfs) is invoked on ro->rw remount, and it is shut down for ro->rw remount. b) ro->rw remount is possible only if there is no rw-mount on the device and the checkpoint number of the ro-mount is latest. c) Remounting snapshot to different checkpoints or rw-mount is not allowed. > Not to mention the reliability of down_read_trylock() in there (what > happens if somebody tries to e.g. remount fs we are looking at the > time? That's right, s_umount held exclusive, down_read_trulock() > failing, fs instance skipped during the search), what is all that > stuff trying to achieve? It tries to find out the existing snapshot instance having a checkpoint number which equals with the specified cp number (2). Yeah, it may fail in the situation. I allow the miss at present because it's harmless and rare though it wastes memory due to unwanted duplication of the snapshot instance. > What protects MS_RDONLY in sd->s_flags during these searches? sd->s_flags is a local variable of the call sites of sget(), but, ugh, s->s_flags is not protected. It may have problem... BTW, MS_RDONLY in s->s_flags is also referred to in get_sb_bdev() and do_remount_sb(). How is the protection achieved for these functions? For example, ext3_remount() or ext3_handle_error() manipulates this flag, and get_sb_bdev() or do_remount_sb() could see it for other existing fs-instances. > What makes parse_options() in remount straight into sb (with reversal > if we'd done something bad) safe? Well, it's not protected except lock_kernel() in the call sites. I should add an argument to the parse_options() in order to forbid destructive changes like ext3/4 does. > Do we ever reassign snapshot_cno > other than on sb creation and remounting between r/o and r/w? We don't allow reassigning the snapshot_cno. But, ugh, nilfs_remount() is rewriting it temporarily. It seems bad. I'll fix it. > Is there any reason why we free sbi early (== in put_super()) and > not after kill_block_super() in ->kill_sb() of your own? No, just I didn't think we have to delay it until ->kill_sb() because bdev->bd_mount_sem and s->s_umount are protecting ->s_fs_info until removing the sb from the type->fs_supers list and the super_blocks list. > That alone would make sbi stay with superblock for as long as it > could be found by any means, killing the locking mess in your test > callbacks. If the advice turned out to be required or help for simplification, I'll take it in. I need some time to think about it. > Can SNAPSHOT even be there unless you have MS_RDONLY? Yes, it can. Nilfs snapshots can exist concurrently with rw-mount. > And what are the rules for exclusion in case of r/w mounts? (answered in the first explanation) > What, do we get -EBUSY on attempt to mount r/w something that is > already mounted r/w (instead of simply sharing superblock, as other > filesystems would do)? > > Or am I misreading that > } else if (!(s->s_flags & MS_RDONLY)) { > err = -EBUSY; > } > in there? Hmm, this looks strange to me, too. Nilfs is sharing a r/w-mount as described in (1). I'll confirm the -EBUSY case. > Very confused Al... Sorry. I recongize these are rather confusing. The use of sget() and the test routines should be reduced to a minimum if feasible. Anyway, thanks for many questions and comments. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sget() misuse in nilfs 2009-05-04 17:11 ` Ryusuke Konishi @ 2009-05-05 8:18 ` Al Viro 2009-05-05 15:37 ` Ryusuke Konishi 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2009-05-05 8:18 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-kernel, linux-fsdevel On Tue, May 05, 2009 at 02:11:29AM +0900, Ryusuke Konishi wrote: > Hi, > On Sun, 3 May 2009 23:51:36 +0100, Al Viro wrote: > > OK, I give up; what _is_ get_sb/remount code supposed to implement? > > Oh, these functions lack spec comments. > > I first explain some specs. (I think part of them should be added on > the code, later) > > The nilfs_get_sb() allocates a new super_block struct (sb) or assigns > an existing sb to the mount point through sget(). For newly allocated > sb it calls nilfs_fill_super() for initialization. > > The following things are supposed here: > 1) Every rw-mount on a device shares the same sb (as usual). OK. That's the first kind of sb; no MS_RDONLY, no SNAPSHOT, snapshot_cno is 0. > 2) Every sb of snapshot is independent with that of rw-mount or other > snapshots if their checkpoint numbers differ. On the other hand, > two or more snapshots having the same checkpoint number share a sb > wherever possible. Umm... That's what, MS_RDONLY, SNAPSHOT, snapshot_cno > 0? > 3) Snapshots are mountable concurrently with a rw-mount, but a > ro-mount is not so because the ro-mount cannot follow changes > brought by the rw-mount. And ro-mount would be MS_RDONLY, no SNAPSHOT, snapshot_cno equal to the nilfs_last_cno()? > b) ro->rw remount is possible only if there is no rw-mount on the > device and the checkpoint number of the ro-mount is latest. Er... How could there be an rw-mount while we have ro one? Your (3) above would seem to prohibit that situation... > c) Remounting snapshot to different checkpoints or rw-mount is not > allowed. Where is the second part checked in the current code? > > Can SNAPSHOT even be there unless you have MS_RDONLY? > > Yes, it can. Nilfs snapshots can exist concurrently with rw-mount. On the same superblock, that is... And AFAICS the answer's "no, it can't" (we can have rw superblock and snapshot superblock at the same time, but those will be different instances of struct superblock). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sget() misuse in nilfs 2009-05-05 8:18 ` Al Viro @ 2009-05-05 15:37 ` Ryusuke Konishi 2009-05-05 16:37 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Ryusuke Konishi @ 2009-05-05 15:37 UTC (permalink / raw) To: viro; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel On Tue, 5 May 2009 09:18:11 +0100, Al Viro wrote: > On Tue, May 05, 2009 at 02:11:29AM +0900, Ryusuke Konishi wrote: > > Hi, > > On Sun, 3 May 2009 23:51:36 +0100, Al Viro wrote: > > > OK, I give up; what _is_ get_sb/remount code supposed to implement? > > > > Oh, these functions lack spec comments. > > > > I first explain some specs. (I think part of them should be added on > > the code, later) > > > > The nilfs_get_sb() allocates a new super_block struct (sb) or assigns > > an existing sb to the mount point through sget(). For newly allocated > > sb it calls nilfs_fill_super() for initialization. > > > > The following things are supposed here: > > > 1) Every rw-mount on a device shares the same sb (as usual). > > OK. That's the first kind of sb; no MS_RDONLY, no SNAPSHOT, snapshot_cno is 0. > > > 2) Every sb of snapshot is independent with that of rw-mount or other > > snapshots if their checkpoint numbers differ. On the other hand, > > two or more snapshots having the same checkpoint number share a sb > > wherever possible. > > Umm... That's what, MS_RDONLY, SNAPSHOT, snapshot_cno > 0? Yes, exactly. > > 3) Snapshots are mountable concurrently with a rw-mount, but a > > ro-mount is not so because the ro-mount cannot follow changes > > brought by the rw-mount. > > And ro-mount would be MS_RDONLY, no SNAPSHOT, snapshot_cno equal to the > nilfs_last_cno()? Yes. > > b) ro->rw remount is possible only if there is no rw-mount on the > > device and the checkpoint number of the ro-mount is latest. > > Er... How could there be an rw-mount while we have ro one? Your > (3) above would seem to prohibit that situation... Oh, meaning of the (b) was ambiguous. How about the following one? b) Remounting an ro-mount to read-only is possible only if the checkpoint number of the target ro-mount is latest and there is no existent rw-mount. > > device and the checkpoint number of the ro-mount is latest. > > c) Remounting snapshot to different checkpoints or rw-mount is not > > allowed. > > Where is the second part checked in the current code? Ah, the second part was wrong. The snapshot with latest checkpoint number can be remounted into an rw-mount. So it should be: c) Remounting a snapshot to a different checkpoint is not allowed. Remounting a snapshot to an rw-mount is possible only if the target snapshot equals to the latest checkpoint. > > > Can SNAPSHOT even be there unless you have MS_RDONLY? > > > > Yes, it can. Nilfs snapshots can exist concurrently with rw-mount. > > On the same superblock, that is... And AFAICS the answer's "no, it can't" > (we can have rw superblock and snapshot superblock at the same time, but > those will be different instances of struct superblock). You are right. It's my misunderstanding. You meant SNAPSHOT for the NILFS_MOUNT_SNAPSHOT flag on sb->s_mount_opt (it does appear as SNAPSHOT on the code). So, the answer is "no, it can't". Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sget() misuse in nilfs 2009-05-05 15:37 ` Ryusuke Konishi @ 2009-05-05 16:37 ` Al Viro 2009-05-06 6:28 ` Ryusuke Konishi 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2009-05-05 16:37 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel On Wed, May 06, 2009 at 12:37:29AM +0900, Ryusuke Konishi wrote: > Oh, meaning of the (b) was ambiguous. How about the following one? > > b) Remounting an ro-mount to read-only is possible only if the > checkpoint number of the target ro-mount is latest and there is no > existent rw-mount. > > c) Remounting a snapshot to a different checkpoint is not allowed. > Remounting a snapshot to an rw-mount is possible only if the > target snapshot equals to the latest checkpoint. That's really rather messy... Let's see if I've got it right: * r/w -> r/w. Allowed. * r/w -> r/o. Allowed. * r/w -> snapshot. Not allowed. * snapshot -> r/w. Allowed if it's the latest one and no r/w is there. * snapshot -> r/o. It remains a snapshot, but says it has succeeded. * snapshot -> snapshot. Only if it's the same. * r/o -> r/w. Allowed [1] * r/o -> r/o. Allowed. * r/o -> snapshot. Allowed only if the snapshot number is the latest. r/w can't coexist with r/o, but can coexist with any snapshots. Can't be remounted to a snapshot directly, but can go through r/w->r/o->latest snapshot in two mount -o remount. "r/o" in the above means "read-only, SNAPSHOT flag not set". What happens if you mount the thing r/w, remount it r/o and then try to mount the latest snapshot? Will that give two superblocks or will it reuse the r/o mount? OTOH, what will happen if you take r/w mount, mount the latest snapshot and then remount the r/w one to r/o? [1] there couldn't have been new r/w mount while r/o one existed, snapshot number couldn't have changed and the only possible transition *into* r/o is from r/w, so another r/w superblock couldn't have survived since before our superblock has become r/o. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sget() misuse in nilfs 2009-05-05 16:37 ` Al Viro @ 2009-05-06 6:28 ` Ryusuke Konishi 2009-05-06 16:09 ` Ryusuke Konishi 0 siblings, 1 reply; 7+ messages in thread From: Ryusuke Konishi @ 2009-05-06 6:28 UTC (permalink / raw) To: viro; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel On Tue, 5 May 2009 17:37:37 +0100, Al Viro wrote: > On Wed, May 06, 2009 at 12:37:29AM +0900, Ryusuke Konishi wrote: > > Oh, meaning of the (b) was ambiguous. How about the following one? > > > > b) Remounting an ro-mount to read-only is possible only if the > > checkpoint number of the target ro-mount is latest and there is no > > existent rw-mount. > > > > c) Remounting a snapshot to a different checkpoint is not allowed. > > Remounting a snapshot to an rw-mount is possible only if the > > target snapshot equals to the latest checkpoint. > > That's really rather messy... Let's see if I've got it right: > > * r/w -> r/w. Allowed. > * r/w -> r/o. Allowed. > * r/w -> snapshot. Not allowed. > * snapshot -> r/w. Allowed if it's the latest one and no r/w is there. > * snapshot -> r/o. It remains a snapshot, but says it has succeeded. Ah, this transition was not assumed. It needs some fix. > * snapshot -> snapshot. Only if it's the same. > * r/o -> r/w. Allowed [1] > * r/o -> r/o. Allowed. > * r/o -> snapshot. Allowed only if the snapshot number is the latest. Look correct. > r/w can't coexist with r/o, but can coexist with any snapshots. > Can't be remounted to a snapshot directly, but can go through > r/w->r/o->latest snapshot in two mount -o remount. Hmm, right. It looks half-baked. The transition "r/w -> latest snapshot" should be allowed to ensure consistency. > "r/o" in the above means "read-only, SNAPSHOT flag not set". > > What happens if you mount the thing r/w, remount it r/o and then try to > mount the latest snapshot? Will that give two superblocks or will it > reuse the r/o mount? It will reuse the r/o mount, which was originally r/w mount. > OTOH, what will happen if you take r/w mount, mount the latest snapshot and > then remount the r/w one to r/o? In that case, the latest snapshot and the r/o-mount will coexist as two different instances. > [1] there couldn't have been new r/w mount while r/o one existed, snapshot > number couldn't have changed and the only possible transition *into* r/o is > from r/w, so another r/w superblock couldn't have survived since before our > superblock has become r/o. I'd rather simplify things. If we treat read-only mount as the latest snapshot at the time (though we didn't take this interpretation), the transitions can be reduced to: * r/w -> r/w. Allowed. * r/w -> snapshot. Allowed if no checkpoint number was given (or the latest checkpoint was specified) * snapshot -> r/w. Allowed if it's the latest one and no r/w is there. * snapshot -> snapshot. Only if it's the same. Right? But it still needs test_exclusive_mount(). The test_exclusive_mount() may be eliminable by adding rw-mount-exists flag on the_nilfs struct. I'll take some thinking. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sget() misuse in nilfs 2009-05-06 6:28 ` Ryusuke Konishi @ 2009-05-06 16:09 ` Ryusuke Konishi 0 siblings, 0 replies; 7+ messages in thread From: Ryusuke Konishi @ 2009-05-06 16:09 UTC (permalink / raw) To: viro; +Cc: konishi.ryusuke, linux-kernel, linux-fsdevel Hi, On Wed, 06 May 2009 15:28:59 +0900 (JST), Ryusuke Konishi wrote: > If we treat read-only mount as the latest snapshot at the time (though > we didn't take this interpretation), the transitions can be reduced > to: > > * r/w -> r/w. Allowed. > * r/w -> snapshot. Allowed if no checkpoint number was given (or the > latest checkpoint was specified) > * snapshot -> r/w. Allowed if it's the latest one and no r/w is there. > * snapshot -> snapshot. Only if it's the same. > > Right? Ah, I had forgotten garbage collection (GC). GC can break checkpoints which are not marked as snapshot. ro-mount cannot coexist with rw-mount because GC works while an rw-mount is there. Sorry, the above interpretation was not easily realized. > But it still needs test_exclusive_mount(). > > The test_exclusive_mount() may be eliminable by adding rw-mount-exists > flag on the_nilfs struct. I'll take some thinking. The elimination of test_exclusive_mount() was possible by this method if we can treat ro-mount as the latest checkpoint at the time. I'd like to consider if a similiar elimination is possible in case that ro-mount and rw-mount cannot coexist. Regards, Ryusuke Konishi ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-05-06 16:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-03 22:51 sget() misuse in nilfs Al Viro 2009-05-04 17:11 ` Ryusuke Konishi 2009-05-05 8:18 ` Al Viro 2009-05-05 15:37 ` Ryusuke Konishi 2009-05-05 16:37 ` Al Viro 2009-05-06 6:28 ` Ryusuke Konishi 2009-05-06 16:09 ` Ryusuke Konishi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).