* [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super @ 2014-10-14 10:36 Jeff Layton 2014-10-14 11:11 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2014-10-14 10:36 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, trond.myklebust, tao.peng As best I can tell, a superblock is only ever removed from the fs_supers list after the s_active counter goes to zero. Thus, the hlist_unhashed check here is redundant, since the atomic_inc_not_zero check in grab_super will catch that case anyway. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> --- fs/super.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/fs/super.c b/fs/super.c index eae088f6aaae..7ff277547c1e 100644 --- a/fs/super.c +++ b/fs/super.c @@ -320,9 +320,7 @@ EXPORT_SYMBOL(deactivate_super); * and want to turn it into a full-blown active reference. grab_super() * is called with sb_lock held and drops it. Returns 1 in case of * success, 0 if we had failed (superblock contents was already dead or - * dying when grab_super() had been called). Note that this is only - * called for superblocks not in rundown mode (== ones still on ->fs_supers - * of their type), so increment of ->s_count is OK here. + * dying when grab_super() had been called). */ static int grab_super(struct super_block *s) __releases(sb_lock) { @@ -641,8 +639,6 @@ struct super_block *get_active_super(struct block_device *bdev) restart: spin_lock(&sb_lock); list_for_each_entry(sb, &super_blocks, s_list) { - if (hlist_unhashed(&sb->s_instances)) - continue; if (sb->s_bdev == bdev) { if (!grab_super(sb)) goto restart; -- 1.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super 2014-10-14 10:36 [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super Jeff Layton @ 2014-10-14 11:11 ` Jeff Layton 2014-10-14 19:13 ` Al Viro 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2014-10-14 11:11 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, trond.myklebust, tao.peng On Tue, 14 Oct 2014 06:36:55 -0400 Jeff Layton <jlayton@primarydata.com> wrote: > As best I can tell, a superblock is only ever removed from the fs_supers > list after the s_active counter goes to zero. Thus, the hlist_unhashed > check here is redundant, since the atomic_inc_not_zero check in > grab_super will catch that case anyway. > > Cc: Trond Myklebust <trond.myklebust@primarydata.com> > Cc: Peng Tao <tao.peng@primarydata.com> > Signed-off-by: Jeff Layton <jlayton@primarydata.com> > --- > fs/super.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/fs/super.c b/fs/super.c > index eae088f6aaae..7ff277547c1e 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -320,9 +320,7 @@ EXPORT_SYMBOL(deactivate_super); > * and want to turn it into a full-blown active reference. grab_super() > * is called with sb_lock held and drops it. Returns 1 in case of > * success, 0 if we had failed (superblock contents was already dead or > - * dying when grab_super() had been called). Note that this is only > - * called for superblocks not in rundown mode (== ones still on ->fs_supers > - * of their type), so increment of ->s_count is OK here. > + * dying when grab_super() had been called). > */ > static int grab_super(struct super_block *s) __releases(sb_lock) > { > @@ -641,8 +639,6 @@ struct super_block *get_active_super(struct block_device *bdev) > restart: > spin_lock(&sb_lock); > list_for_each_entry(sb, &super_blocks, s_list) { > - if (hlist_unhashed(&sb->s_instances)) > - continue; > if (sb->s_bdev == bdev) { > if (!grab_super(sb)) > goto restart; Ugh. Now that I've sent the patch, I think this is bogus, so I'll go ahead and NAK it myself... The problem is not the s_active counter, but the fact that grab_super bumps the s_count unconditionally. So, this code seems to be using the hlist_unhashed check to verify that it's ok to bump the s_count there. That said, I don't quite get why it uses that as the check -- would it not be more straightforward to simply have grab_super and other callers check for a 0->1 transition of the s_count? -- Jeff Layton <jlayton@primarydata.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super 2014-10-14 11:11 ` Jeff Layton @ 2014-10-14 19:13 ` Al Viro 2014-10-14 19:19 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: Al Viro @ 2014-10-14 19:13 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-fsdevel, trond.myklebust, tao.peng On Tue, Oct 14, 2014 at 07:11:34AM -0400, Jeff Layton wrote: > Ugh. Now that I've sent the patch, I think this is bogus, so I'll go > ahead and NAK it myself... > > The problem is not the s_active counter, but the fact that grab_super > bumps the s_count unconditionally. So, this code seems to be using the > hlist_unhashed check to verify that it's ok to bump the s_count there. > > That said, I don't quite get why it uses that as the check -- would it > not be more straightforward to simply have grab_super and other callers > check for a 0->1 transition of the s_count? Because there might be *other* holders of ->s_count waiting to run down. Basically, you reintroduce a nasty livelock that way. Once the superblock is past the shutdown, ->s_count can go only down. Somebody might be holding it while waiting for ->s_umount to be acquired, but all such processes will detect that it's dead as soon as they get ->s_umount, drop it and drop ->s_count. Allowing e.g. grab_super() to _increment_ ->s_count and try to get ->s_umount at that stage would reopen a livelock we used to have. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super 2014-10-14 19:13 ` Al Viro @ 2014-10-14 19:19 ` Jeff Layton 2014-10-14 23:05 ` Al Viro 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2014-10-14 19:19 UTC (permalink / raw) To: Al Viro; +Cc: Jeff Layton, linux-fsdevel, trond.myklebust, tao.peng On Tue, 14 Oct 2014 20:13:26 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Tue, Oct 14, 2014 at 07:11:34AM -0400, Jeff Layton wrote: > > > Ugh. Now that I've sent the patch, I think this is bogus, so I'll go > > ahead and NAK it myself... > > > > The problem is not the s_active counter, but the fact that grab_super > > bumps the s_count unconditionally. So, this code seems to be using the > > hlist_unhashed check to verify that it's ok to bump the s_count there. > > > > That said, I don't quite get why it uses that as the check -- would it > > not be more straightforward to simply have grab_super and other callers > > check for a 0->1 transition of the s_count? > > Because there might be *other* holders of ->s_count waiting to run down. > Basically, you reintroduce a nasty livelock that way. Once the > superblock is past the shutdown, ->s_count can go only down. Somebody > might be holding it while waiting for ->s_umount to be acquired, but > all such processes will detect that it's dead as soon as they get > ->s_umount, drop it and drop ->s_count. Allowing e.g. grab_super() to > _increment_ ->s_count and try to get ->s_umount at that stage would > reopen a livelock we used to have. Ok, got it. Thanks for the clarification, Al! -- Jeff Layton <jlayton@primarydata.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super 2014-10-14 19:19 ` Jeff Layton @ 2014-10-14 23:05 ` Al Viro 0 siblings, 0 replies; 5+ messages in thread From: Al Viro @ 2014-10-14 23:05 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-fsdevel, trond.myklebust, tao.peng On Tue, Oct 14, 2014 at 03:19:11PM -0400, Jeff Layton wrote: > Ok, got it. Thanks for the clarification, Al! FWIW, the life cycle for superblocks looks so: * Allocated: all instances are created in this state, all by alloc_super(). Invisible to global data structures. ->s_umount held exclusive, ->s_count is 1, ->s_active is 1, no MS_BORN in flags. Possible successors: Implanted, Freeing; transition happens within sget() (which is the sole caller of alloc_super()). * Implanted: set up by 'set()' callback of sget() sufficiently to be recognizable by 'test()' callback of the same and inserted into super_blocks and type->fs_supers (all under sb_lock). Possible successors: SettingUp, Freeing. The latter happens if 'set()' fails, the latter - if it succeeds, sb_lock is dropped upon either transition (both are within sget()). ->s_count is 1, ->s_active is 1, ->s_umount held exclusive, !MS_BORN * SettingUp: in super_blocks and type->fs_supers, ->s_active is still 1, ->s_count > 0, !MS_BORN. That's the state in which new instances are returned by sget() to its callers. ->s_umount might be dropped and regained; in the end it is dropped. Subsequent sget() attempts on the same fs will block until this instance leaves that state. No ->s_active increments are allowed. That's when the bulk of filesystem setup is being done. Possible successors: Born, ShuttingDown (depending on whether that setup attempt succeeds or fails). Instances in that state are seen by ->mount(). Transition to Born consists of setting MS_BORN and dropping ->s_umount; transition to ShuttingDown - call of deactivate_locked_super(). * Born: in super_blocks and type->fs_supers, ->s_umount is not held, ->s_active > 0, ->s_count > 0, MS_BORN is set, ->s_root is non-NULL. That's the normal state; fs is fully set up and active. ->s_active increments and decrements are possible. ->s_active can reach 0 only with ->s_umount held exclusive - that happens only in deactivate_locked_super() and moves us to ShuttingDown state. That's the only possible successor. * ShuttingDown: still in super_blocks and type->fs_supers, ->s_umount is held exclusive, ->s_active is 0 (and will never increment after that point), ->s_count is positive, MS_BORN is set. That's where the fs shutdown happens. At some point in ->kill_sb() we must call generic_shutdown_super(), which will do the type-independent part of shutdown, including dentry tree freeing, inode eviction, etc. And, in the end, removes it from type->fs_supers (protected by sb_lock) and drops ->s_umount. At that point we are in RunDown state. In principle, dropping and regaining ->s_umount in ShuttingDown state is allowed, as long as it's not done until ->s_root has been made NULL (by shrink_dcache_for_umount() from generic_shutdown_super()), but I don't know of any fs type that would do it for any reasons. * Rundown: still in super_blocks. ->s_count > 0, ->s_active is 0, MS_BORN is set, ->s_root is NULL. No increments of ->s_count from that point; there might be processes blocked on contended attempts to get ->s_umount, but as soon as they get it and see that superblock is in that state, they drop ->s_umount and decrement ->s_count. Once ->s_count reaches 0, we remove it from super_blocks and move to Freeing (again, all manipulations of lists and of ->s_count happen under sb_lock). * Freeing: what is says. We free that sucker. That's in destroy_super(). There are some extra complications from RCU pathwalk. Namely, we need to make sure that all freeing of data structures needed by LOOKUP_RCU ->lookup(), LOOKUP_RCU ->d_revalidate(), ->d_manage(..., true) and to ->d_hash() and ->d_compare() (i.e. all fs code that can be called from RCU pathwalk) won't happen without RCU delay. For struct super_block itself it is guaranteed by use of kfree_rcu() in destroy_super(), for fs type *module* it's usual logics in module_put() (we hold a reference to module from a bit before entering SettingUp to a bit after transition to RunDown). Grace periods for dentries and inodes are dealt with by VFS, provided that they references to them are not leaked by fs driver. Filesystem-private data structures are responsibility of fs driver itself, of course. Note that instance remains on the super_blocks until it's about to get freed. That simplifies a lot of logics in list traversals - we walk it under sb_lock and as long as we bump ->s_count before dropping sb_lock, we can be sure that the instance will stay on list until we do the matching decrement. That's the reason for asymmetry between super_blocks and type->fs_supers. To grab ->s_umount you need either guaranteed positive ->s_count or guaranteed positive ->s_atomic (the latter guarantees the former). Holding ->s_umount is enough to stabilize the state. With ->s_umount grabbed by somebody other than the process doing the lifecycle transitions, the following is true: ->s_root is non-NULL => it's in SettingUp or Born state ->s_active is positive => it's in SettingUp or Born state MS_BORN is set => it's in Born, ShuttingDown or RunDown state. Note that checking MS_BORN *is* needed - in SettingUp we allow to drop and regain ->s_umount, so the first two tests do not distinguish SettingUp from Born. OTOH, ->mnt_sb is guaranteed to be in Born state and remain such until vfsmount is held. With sb_lock held the following is true: it's on type->fs_supers <=> it's in SettingUp, Born or ShuttingDown We are not allowed to increment ->s_active unless in Born state. We are not allowed to increment ->s_count when in RunDown state. Decrements of ->s_active from 1 to 0 are all done under ->s_umount by deactivate_locked_super(). Changes of ->s_count and lists manipulations are done by sb_lock. All callers of sget() are in ->mount() instances. Superblock stays on type->fs_supers from the beginning of setup to the end of shutdown. sget() while a matching fs is in SettingUp or ShuttingDown state will block; sget() while such fs is in Born state will bump its ->s_active, and return it with s_umount held exclusive. Callers can tell whether they got a new instance or an extra reference to existing one by checking ->s_root of what they get (NULL <=> it's a new instance). All callers of generic_shutdown_super() are from ->kill_sb() instances and it must be called by any such instance. Note, BTW, that once we are in SettingUp state, the call of ->kill_sb() is guaranteed, whether on normal fs shutdown or in case of setup failure, no matter how early in setup that failure happens. The rules are different for older method (->put_super()), which is called by generic_shutdown_super() (itself called by ->kill_sb()) only if ->s_root has already been made non-NULL. The braindump above probably needs to be reordered into more readable form and put into Documentation/filesystems/<something>; for now let it just sit in list archives and wait... ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-10-14 23:05 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-14 10:36 [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super Jeff Layton 2014-10-14 11:11 ` Jeff Layton 2014-10-14 19:13 ` Al Viro 2014-10-14 19:19 ` Jeff Layton 2014-10-14 23:05 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).