Re: [rfc][possible solution] RCU vfsmounts

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [rfc][possible solution] RCU vfsmounts
Date: Wed, 2 Oct 2013 02:30:13 +0100	[thread overview]
Message-ID: <20131002013013.GA11680@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20130930194921.GS13318@ZenIV.linux.org.uk>

On Mon, Sep 30, 2013 at 08:49:21PM +0100, Al Viro wrote:
> OK...  AFAICS, we are not too far from being able to handle RCU pathwalk
> straying into fs in the middle of being shut down.
> 	* There are 5 methods that can be called:
> ->d_hash(...)
> ->d_compare(...)
> ->d_revalidate(..., LOOKUP_RCU | ...)
> ->d_manage(..., true)
> ->permission(..., MAY_NOT_BLOCK | MAY_EXEC)
> Filesystem needs to be able to survive those during shutdown.  The stuff
> needed for that should _not_ be freed without synchronize_rcu() (or via
> call_rcu()); usually ->s_fs_info is involved (when anything is needed
> at all).  In any case, we shouldn't allow rmmod without making sure that
> everything in RCU mode has run out, but most of the filesystems have
> rcu_barrier() in their exit_module anyway.

... and unregister_filesystem() has synchronize_rcu(), which leaves takes
care of everything other than callbacks in fs code fed to call_rcu().
So the things are even less painful.

> 	* __put_super() probably ought to delay actual freeing via
> call_rcu(); might not be strictly necessary, but probably a good idea
> anyway.
> 	* shrink_dcache_for_umount() ought to use d_walk(), a-la
> shrink_dcache_parent().
> 
> Note that most of the filesystems don't have any of these methods or
> don't look at anything outside of inode/dentry involved in RCU case.
> Zoo:
> 
> * adfs: has the name length limit in fs-private part of superblock; used
> by ->d_hash() and ->d_compare().  No other methods involved, synchronize_rcu()
> before doing kfree() in adfs_put_super() will suffice.
> 
> * autofs4: wants fs-private part of superblock in ->d_manage().
> synchronize_rcu() in autofs4_kill_sb() would do it, or we could delay
> freeing that sucker via call_rcu() (in that case we want delayed
> freeing in __put_super() as well).
> 
> * btrfs: wants btrfs_root_readonly(BTRFS_I(inode)->root) usable in
> ->permission().  Delayed freeing of struct btrfs_root, perhaps?
> 
> * cifs: wants nls, refered to from fs-private part of superblock.
> ->permission() wants fs-private part of superblock as well.  Just
> synchronize_rcu() before unload_nls() in cifs_umount()...
> 
> * fat: same situation as with cifs
> 
> * fuse: delayed freeing of struct fuse_conn?  BTW, Miklos, just what is
>         } else if (mask & (MAY_ACCESS | MAY_CHDIR)) {
>                 if (mask & MAY_NOT_BLOCK)
>                         return -ECHILD;
> about, when we never pass such combinations?  Oh, well...
> 
> * hpfs: similar to cifs and fat, only without use of nls (a homegrown table
> of some sort).
> 
> * ncpfs: _probably_ similar to cifs et.al., but there might be dragons
> 
> * procfs: delayed freeing of pid_namespace?
> 
> * lustre: messy, haven't looked through that.

Extremely preliminary version is in vfs.git #experimental.  No fs-specific
fixes mentioned above are in there; relevant stuff is in the end of
queue -
      initialize namespace_sem statically
      fs_is_visible only needs namespace_sem held shared
      dup_mnt_ns(): get rid of pointless grabbing of vfsmount_lock
      do_remount(): pull touch_mnt_namespace() up
      fold mntfree() into mntput_no_expire()
      fs/namespace.c: bury long-dead define
      finish_automount() doesn't need vfsmount_lock for removal from expiry list
      mnt_set_expiry() doesn't need vfsmount_lock
      fold dup_mnt_ns() into its only surviving caller
      namespace.c: get rid of mnt_ghosts
      don't bother with vfsmount_lock in mounts_poll()
      new helpers: lock_mount_hash/unlock_mount_hash
      isofs: don't pass dentry to isofs_hash{i,}_common()
      uninline destroy_super(), consolidate alloc_super()
      split __lookup_mnt() in two functions
      move taking vfsmount_lock down into prepend_path()
      RCU'd vfsmounts
 fs/dcache.c           |  221 +++++++++++++----------------
 fs/internal.h         |    4 -
 fs/isofs/inode.c      |   12 +-
 fs/mount.h            |   20 +++-
 fs/namei.c            |   87 +++++------
 fs/namespace.c        |  386 +++++++++++++++++++++++++------------------------
 fs/pnode.c            |   13 +-
 fs/proc_namespace.c   |    8 +-
 fs/super.c            |  206 +++++++++++---------------
 include/linux/mount.h |    2 +
 include/linux/namei.h |    2 +-
 11 files changed, 455 insertions(+), 506 deletions(-)

With fs fixes added it'll probably still end up with slightly negative
balance...

It's barely tested (i.e. no beating for races, etc. - boots and shuts down
without any apparent problems, but that's it) and the last commit almost
certainly needs a splitup.  Everything prior to it can probably go into
-next and I hope to carve standalone pieces from it as well.  Review and
comments are welcome; personally, I prefer to use git fetch for review,
but if somebody prefers posted mailbomb, yell and I'll send it...

next prev parent reply	other threads:[~2013-10-02  1:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-28 20:27 [rfc][possible solution] RCU vfsmounts Al Viro
2013-09-28 20:43 ` Linus Torvalds
2013-09-29  6:06   ` Al Viro
2013-09-29 17:19     ` Linus Torvalds
2013-09-29 18:10       ` Al Viro
2013-09-29 18:26         ` Linus Torvalds
2013-09-30 10:48           ` Miklos Szeredi
2013-09-29 18:49         ` Al Viro
2013-09-29 19:04         ` Al Viro
2013-09-30 19:49         ` Al Viro
2013-10-02  1:30           ` Al Viro [this message]
2013-10-03  6:14           ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131002013013.GA11680@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).