linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super
@ 2014-10-14 10:36 Jeff Layton
  2014-10-14 11:11 ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2014-10-14 10:36 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, trond.myklebust, tao.peng

As best I can tell, a superblock is only ever removed from the fs_supers
list after the s_active counter goes to zero. Thus, the hlist_unhashed
check here is redundant, since the atomic_inc_not_zero check in
grab_super will catch that case anyway.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
---
 fs/super.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index eae088f6aaae..7ff277547c1e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -320,9 +320,7 @@ EXPORT_SYMBOL(deactivate_super);
  *	and want to turn it into a full-blown active reference.  grab_super()
  *	is called with sb_lock held and drops it.  Returns 1 in case of
  *	success, 0 if we had failed (superblock contents was already dead or
- *	dying when grab_super() had been called).  Note that this is only
- *	called for superblocks not in rundown mode (== ones still on ->fs_supers
- *	of their type), so increment of ->s_count is OK here.
+ *	dying when grab_super() had been called).
  */
 static int grab_super(struct super_block *s) __releases(sb_lock)
 {
@@ -641,8 +639,6 @@ struct super_block *get_active_super(struct block_device *bdev)
 restart:
 	spin_lock(&sb_lock);
 	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (hlist_unhashed(&sb->s_instances))
-			continue;
 		if (sb->s_bdev == bdev) {
 			if (!grab_super(sb))
 				goto restart;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super
  2014-10-14 10:36 [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super Jeff Layton
@ 2014-10-14 11:11 ` Jeff Layton
  2014-10-14 19:13   ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2014-10-14 11:11 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, trond.myklebust, tao.peng

On Tue, 14 Oct 2014 06:36:55 -0400
Jeff Layton <jlayton@primarydata.com> wrote:

> As best I can tell, a superblock is only ever removed from the fs_supers
> list after the s_active counter goes to zero. Thus, the hlist_unhashed
> check here is redundant, since the atomic_inc_not_zero check in
> grab_super will catch that case anyway.
> 
> Cc: Trond Myklebust <trond.myklebust@primarydata.com>
> Cc: Peng Tao <tao.peng@primarydata.com>
> Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> ---
>  fs/super.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index eae088f6aaae..7ff277547c1e 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -320,9 +320,7 @@ EXPORT_SYMBOL(deactivate_super);
>   *	and want to turn it into a full-blown active reference.  grab_super()
>   *	is called with sb_lock held and drops it.  Returns 1 in case of
>   *	success, 0 if we had failed (superblock contents was already dead or
> - *	dying when grab_super() had been called).  Note that this is only
> - *	called for superblocks not in rundown mode (== ones still on ->fs_supers
> - *	of their type), so increment of ->s_count is OK here.
> + *	dying when grab_super() had been called).
>   */
>  static int grab_super(struct super_block *s) __releases(sb_lock)
>  {
> @@ -641,8 +639,6 @@ struct super_block *get_active_super(struct block_device *bdev)
>  restart:
>  	spin_lock(&sb_lock);
>  	list_for_each_entry(sb, &super_blocks, s_list) {
> -		if (hlist_unhashed(&sb->s_instances))
> -			continue;
>  		if (sb->s_bdev == bdev) {
>  			if (!grab_super(sb))
>  				goto restart;

Ugh. Now that I've sent the patch, I think this is bogus, so I'll go
ahead and NAK it myself...

The problem is not the s_active counter, but the fact that grab_super
bumps the s_count unconditionally. So, this code seems to be using the
hlist_unhashed check to verify that it's ok to bump the s_count there.

That said, I don't quite get why it uses that as the check -- would it
not be more straightforward to simply have grab_super and other callers
check for a 0->1 transition of the s_count?

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super
  2014-10-14 11:11 ` Jeff Layton
@ 2014-10-14 19:13   ` Al Viro
  2014-10-14 19:19     ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2014-10-14 19:13 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-fsdevel, trond.myklebust, tao.peng

On Tue, Oct 14, 2014 at 07:11:34AM -0400, Jeff Layton wrote:

> Ugh. Now that I've sent the patch, I think this is bogus, so I'll go
> ahead and NAK it myself...
> 
> The problem is not the s_active counter, but the fact that grab_super
> bumps the s_count unconditionally. So, this code seems to be using the
> hlist_unhashed check to verify that it's ok to bump the s_count there.
> 
> That said, I don't quite get why it uses that as the check -- would it
> not be more straightforward to simply have grab_super and other callers
> check for a 0->1 transition of the s_count?

Because there might be *other* holders of ->s_count waiting to run down.
Basically, you reintroduce a nasty livelock that way.  Once the
superblock is past the shutdown, ->s_count can go only down.  Somebody
might be holding it while waiting for ->s_umount to be acquired, but
all such processes will detect that it's dead as soon as they get
->s_umount, drop it and drop ->s_count.  Allowing e.g. grab_super() to
_increment_ ->s_count and try to get ->s_umount at that stage would
reopen a livelock we used to have.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super
  2014-10-14 19:13   ` Al Viro
@ 2014-10-14 19:19     ` Jeff Layton
  2014-10-14 23:05       ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2014-10-14 19:19 UTC (permalink / raw)
  To: Al Viro; +Cc: Jeff Layton, linux-fsdevel, trond.myklebust, tao.peng

On Tue, 14 Oct 2014 20:13:26 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Tue, Oct 14, 2014 at 07:11:34AM -0400, Jeff Layton wrote:
> 
> > Ugh. Now that I've sent the patch, I think this is bogus, so I'll go
> > ahead and NAK it myself...
> > 
> > The problem is not the s_active counter, but the fact that grab_super
> > bumps the s_count unconditionally. So, this code seems to be using the
> > hlist_unhashed check to verify that it's ok to bump the s_count there.
> > 
> > That said, I don't quite get why it uses that as the check -- would it
> > not be more straightforward to simply have grab_super and other callers
> > check for a 0->1 transition of the s_count?
> 
> Because there might be *other* holders of ->s_count waiting to run down.
> Basically, you reintroduce a nasty livelock that way.  Once the
> superblock is past the shutdown, ->s_count can go only down.  Somebody
> might be holding it while waiting for ->s_umount to be acquired, but
> all such processes will detect that it's dead as soon as they get
> ->s_umount, drop it and drop ->s_count.  Allowing e.g. grab_super() to
> _increment_ ->s_count and try to get ->s_umount at that stage would
> reopen a livelock we used to have.

Ok, got it. Thanks for the clarification, Al!

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super
  2014-10-14 19:19     ` Jeff Layton
@ 2014-10-14 23:05       ` Al Viro
  0 siblings, 0 replies; 5+ messages in thread
From: Al Viro @ 2014-10-14 23:05 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-fsdevel, trond.myklebust, tao.peng

On Tue, Oct 14, 2014 at 03:19:11PM -0400, Jeff Layton wrote:
> Ok, got it. Thanks for the clarification, Al!

FWIW, the life cycle for superblocks looks so:

* Allocated: all instances are created in this state, all by alloc_super().
Invisible to global data structures.  ->s_umount held exclusive, ->s_count is
1, ->s_active is 1, no MS_BORN in flags.  Possible successors: Implanted,
Freeing; transition happens within sget() (which is the sole caller of
alloc_super()).

* Implanted: set up by 'set()' callback of sget() sufficiently to be
recognizable by 'test()' callback of the same and inserted into super_blocks
and type->fs_supers (all under sb_lock).  Possible successors: SettingUp,
Freeing.  The latter happens if 'set()' fails, the latter - if it succeeds,
sb_lock is dropped upon either transition (both are within sget()).
->s_count is 1, ->s_active is 1, ->s_umount held exclusive, !MS_BORN

* SettingUp: in super_blocks and type->fs_supers, ->s_active is still 1,
->s_count > 0, !MS_BORN.  That's the state in which new instances are
returned by sget() to its callers.  ->s_umount might be dropped and
regained; in the end it is dropped.  Subsequent sget() attempts on the
same fs will block until this instance leaves that state.  No ->s_active
increments are allowed.  That's when the bulk of filesystem setup is being
done.  Possible successors: Born, ShuttingDown (depending on whether that
setup attempt succeeds or fails).  Instances in that state are seen by
->mount().  Transition to Born consists of setting MS_BORN and dropping
->s_umount; transition to ShuttingDown - call of deactivate_locked_super().

* Born: in super_blocks and type->fs_supers, ->s_umount is not held,
->s_active > 0, ->s_count > 0, MS_BORN is set, ->s_root is non-NULL.
That's the normal state; fs is fully set up and active.  ->s_active
increments and decrements are possible.  ->s_active can reach 0 only
with ->s_umount held exclusive - that happens only in deactivate_locked_super()
and moves us to ShuttingDown state.  That's the only possible successor.

* ShuttingDown: still in super_blocks and type->fs_supers, ->s_umount is
held exclusive, ->s_active is 0 (and will never increment after that point),
->s_count is positive, MS_BORN is set.  That's where the fs shutdown happens.
At some point in ->kill_sb() we must call generic_shutdown_super(), which
will do the type-independent part of shutdown, including dentry tree
freeing, inode eviction, etc.  And, in the end, removes it from
type->fs_supers (protected by sb_lock) and drops ->s_umount.  At that point
we are in RunDown state.  In principle, dropping and regaining ->s_umount
in ShuttingDown state is allowed, as long as it's not done until ->s_root
has been made NULL (by shrink_dcache_for_umount() from
generic_shutdown_super()), but I don't know of any fs type that would
do it for any reasons.

* Rundown: still in super_blocks. ->s_count > 0, ->s_active is 0, MS_BORN is
set, ->s_root is NULL.  No increments of ->s_count from that point; there
might be processes blocked on contended attempts to get ->s_umount, but as
soon as they get it and see that superblock is in that state, they drop
->s_umount and decrement ->s_count.  Once ->s_count reaches 0, we remove
it from super_blocks and move to Freeing (again, all manipulations of lists
and of ->s_count happen under sb_lock).

* Freeing: what is says.  We free that sucker.  That's in destroy_super().

There are some extra complications from RCU pathwalk.  Namely, we need to
make sure that all freeing of data structures needed by LOOKUP_RCU ->lookup(),
LOOKUP_RCU ->d_revalidate(), ->d_manage(..., true) and to ->d_hash() and
->d_compare() (i.e. all fs code that can be called from RCU pathwalk) won't
happen without RCU delay.  For struct super_block itself it is guaranteed
by use of kfree_rcu() in destroy_super(), for fs type *module* it's usual
logics in module_put() (we hold a reference to module from a bit before
entering SettingUp to a bit after transition to RunDown).  Grace periods for
dentries and inodes are dealt with by VFS, provided that they references to
them are not leaked by fs driver.  Filesystem-private data structures
are responsibility of fs driver itself, of course.

Note that instance remains on the super_blocks until it's about to get
freed.  That simplifies a lot of logics in list traversals - we walk it
under sb_lock and as long as we bump ->s_count before dropping sb_lock,
we can be sure that the instance will stay on list until we do the matching
decrement.  That's the reason for asymmetry between super_blocks and
type->fs_supers.

To grab ->s_umount you need either guaranteed positive ->s_count or
guaranteed positive ->s_atomic (the latter guarantees the former).
Holding ->s_umount is enough to stabilize the state.
With ->s_umount grabbed by somebody other than the process doing the
lifecycle transitions, the following is true:
	->s_root is non-NULL => it's in SettingUp or Born state
	->s_active is positive => it's in SettingUp or Born state
	MS_BORN is set => it's in Born, ShuttingDown or RunDown state.
Note that checking MS_BORN *is* needed - in SettingUp we allow to drop
and regain ->s_umount, so the first two tests do not distinguish SettingUp
from Born.  OTOH, ->mnt_sb is guaranteed to be in Born state and remain
such until vfsmount is held.
With sb_lock held the following is true:
	it's on type->fs_supers <=> it's in SettingUp, Born or ShuttingDown

We are not allowed to increment ->s_active unless in Born state.
We are not allowed to increment ->s_count when in RunDown state.
Decrements of ->s_active from 1 to 0 are all done under ->s_umount by
deactivate_locked_super().
Changes of ->s_count and lists manipulations are done by sb_lock.
All callers of sget() are in ->mount() instances.
Superblock stays on type->fs_supers from the beginning of setup to the end
of shutdown.  sget() while a matching fs is in SettingUp or ShuttingDown state
will block; sget() while such fs is in Born state will bump its ->s_active,
and return it with s_umount held exclusive.  Callers can tell whether they
got a new instance or an extra reference to existing one by checking ->s_root
of what they get (NULL <=> it's a new instance).
All callers of generic_shutdown_super() are from ->kill_sb() instances and
it must be called by any such instance.

Note, BTW, that once we are in SettingUp state, the call of ->kill_sb() is
guaranteed, whether on normal fs shutdown or in case of setup failure, no
matter how early in setup that failure happens.  The rules are different for
older method (->put_super()), which is called by generic_shutdown_super()
(itself called by ->kill_sb()) only if ->s_root has already been made non-NULL.

The braindump above probably needs to be reordered into more readable form and
put into Documentation/filesystems/<something>; for now let it just sit in
list archives and wait...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-10-14 23:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-14 10:36 [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super Jeff Layton
2014-10-14 11:11 ` Jeff Layton
2014-10-14 19:13   ` Al Viro
2014-10-14 19:19     ` Jeff Layton
2014-10-14 23:05       ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).