From: Al Viro <viro@ZenIV.linux.org.uk>
To: Jeff Layton <jeff.layton@primarydata.com>
Cc: linux-fsdevel@vger.kernel.org, trond.myklebust@primarydata.com,
tao.peng@primarydata.com
Subject: Re: [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super
Date: Wed, 15 Oct 2014 00:05:20 +0100 [thread overview]
Message-ID: <20141014230520.GA7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20141014151911.2581f362@tlielax.poochiereds.net>
On Tue, Oct 14, 2014 at 03:19:11PM -0400, Jeff Layton wrote:
> Ok, got it. Thanks for the clarification, Al!
FWIW, the life cycle for superblocks looks so:
* Allocated: all instances are created in this state, all by alloc_super().
Invisible to global data structures. ->s_umount held exclusive, ->s_count is
1, ->s_active is 1, no MS_BORN in flags. Possible successors: Implanted,
Freeing; transition happens within sget() (which is the sole caller of
alloc_super()).
* Implanted: set up by 'set()' callback of sget() sufficiently to be
recognizable by 'test()' callback of the same and inserted into super_blocks
and type->fs_supers (all under sb_lock). Possible successors: SettingUp,
Freeing. The latter happens if 'set()' fails, the latter - if it succeeds,
sb_lock is dropped upon either transition (both are within sget()).
->s_count is 1, ->s_active is 1, ->s_umount held exclusive, !MS_BORN
* SettingUp: in super_blocks and type->fs_supers, ->s_active is still 1,
->s_count > 0, !MS_BORN. That's the state in which new instances are
returned by sget() to its callers. ->s_umount might be dropped and
regained; in the end it is dropped. Subsequent sget() attempts on the
same fs will block until this instance leaves that state. No ->s_active
increments are allowed. That's when the bulk of filesystem setup is being
done. Possible successors: Born, ShuttingDown (depending on whether that
setup attempt succeeds or fails). Instances in that state are seen by
->mount(). Transition to Born consists of setting MS_BORN and dropping
->s_umount; transition to ShuttingDown - call of deactivate_locked_super().
* Born: in super_blocks and type->fs_supers, ->s_umount is not held,
->s_active > 0, ->s_count > 0, MS_BORN is set, ->s_root is non-NULL.
That's the normal state; fs is fully set up and active. ->s_active
increments and decrements are possible. ->s_active can reach 0 only
with ->s_umount held exclusive - that happens only in deactivate_locked_super()
and moves us to ShuttingDown state. That's the only possible successor.
* ShuttingDown: still in super_blocks and type->fs_supers, ->s_umount is
held exclusive, ->s_active is 0 (and will never increment after that point),
->s_count is positive, MS_BORN is set. That's where the fs shutdown happens.
At some point in ->kill_sb() we must call generic_shutdown_super(), which
will do the type-independent part of shutdown, including dentry tree
freeing, inode eviction, etc. And, in the end, removes it from
type->fs_supers (protected by sb_lock) and drops ->s_umount. At that point
we are in RunDown state. In principle, dropping and regaining ->s_umount
in ShuttingDown state is allowed, as long as it's not done until ->s_root
has been made NULL (by shrink_dcache_for_umount() from
generic_shutdown_super()), but I don't know of any fs type that would
do it for any reasons.
* Rundown: still in super_blocks. ->s_count > 0, ->s_active is 0, MS_BORN is
set, ->s_root is NULL. No increments of ->s_count from that point; there
might be processes blocked on contended attempts to get ->s_umount, but as
soon as they get it and see that superblock is in that state, they drop
->s_umount and decrement ->s_count. Once ->s_count reaches 0, we remove
it from super_blocks and move to Freeing (again, all manipulations of lists
and of ->s_count happen under sb_lock).
* Freeing: what is says. We free that sucker. That's in destroy_super().
There are some extra complications from RCU pathwalk. Namely, we need to
make sure that all freeing of data structures needed by LOOKUP_RCU ->lookup(),
LOOKUP_RCU ->d_revalidate(), ->d_manage(..., true) and to ->d_hash() and
->d_compare() (i.e. all fs code that can be called from RCU pathwalk) won't
happen without RCU delay. For struct super_block itself it is guaranteed
by use of kfree_rcu() in destroy_super(), for fs type *module* it's usual
logics in module_put() (we hold a reference to module from a bit before
entering SettingUp to a bit after transition to RunDown). Grace periods for
dentries and inodes are dealt with by VFS, provided that they references to
them are not leaked by fs driver. Filesystem-private data structures
are responsibility of fs driver itself, of course.
Note that instance remains on the super_blocks until it's about to get
freed. That simplifies a lot of logics in list traversals - we walk it
under sb_lock and as long as we bump ->s_count before dropping sb_lock,
we can be sure that the instance will stay on list until we do the matching
decrement. That's the reason for asymmetry between super_blocks and
type->fs_supers.
To grab ->s_umount you need either guaranteed positive ->s_count or
guaranteed positive ->s_atomic (the latter guarantees the former).
Holding ->s_umount is enough to stabilize the state.
With ->s_umount grabbed by somebody other than the process doing the
lifecycle transitions, the following is true:
->s_root is non-NULL => it's in SettingUp or Born state
->s_active is positive => it's in SettingUp or Born state
MS_BORN is set => it's in Born, ShuttingDown or RunDown state.
Note that checking MS_BORN *is* needed - in SettingUp we allow to drop
and regain ->s_umount, so the first two tests do not distinguish SettingUp
from Born. OTOH, ->mnt_sb is guaranteed to be in Born state and remain
such until vfsmount is held.
With sb_lock held the following is true:
it's on type->fs_supers <=> it's in SettingUp, Born or ShuttingDown
We are not allowed to increment ->s_active unless in Born state.
We are not allowed to increment ->s_count when in RunDown state.
Decrements of ->s_active from 1 to 0 are all done under ->s_umount by
deactivate_locked_super().
Changes of ->s_count and lists manipulations are done by sb_lock.
All callers of sget() are in ->mount() instances.
Superblock stays on type->fs_supers from the beginning of setup to the end
of shutdown. sget() while a matching fs is in SettingUp or ShuttingDown state
will block; sget() while such fs is in Born state will bump its ->s_active,
and return it with s_umount held exclusive. Callers can tell whether they
got a new instance or an extra reference to existing one by checking ->s_root
of what they get (NULL <=> it's a new instance).
All callers of generic_shutdown_super() are from ->kill_sb() instances and
it must be called by any such instance.
Note, BTW, that once we are in SettingUp state, the call of ->kill_sb() is
guaranteed, whether on normal fs shutdown or in case of setup failure, no
matter how early in setup that failure happens. The rules are different for
older method (->put_super()), which is called by generic_shutdown_super()
(itself called by ->kill_sb()) only if ->s_root has already been made non-NULL.
The braindump above probably needs to be reordered into more readable form and
put into Documentation/filesystems/<something>; for now let it just sit in
list archives and wait...
prev parent reply other threads:[~2014-10-14 23:05 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-14 10:36 [PATCH] vfs: remove unneeded hlist_unhashed check from get_active_super Jeff Layton
2014-10-14 11:11 ` Jeff Layton
2014-10-14 19:13 ` Al Viro
2014-10-14 19:19 ` Jeff Layton
2014-10-14 23:05 ` Al Viro [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141014230520.GA7996@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=jeff.layton@primarydata.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tao.peng@primarydata.com \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).