All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [rfc][possible solution] RCU vfsmounts
Date: Wed, 2 Oct 2013 02:30:13 +0100	[thread overview]
Message-ID: <20131002013013.GA11680@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20130930194921.GS13318@ZenIV.linux.org.uk>

On Mon, Sep 30, 2013 at 08:49:21PM +0100, Al Viro wrote:
> OK...  AFAICS, we are not too far from being able to handle RCU pathwalk
> straying into fs in the middle of being shut down.
> 	* There are 5 methods that can be called:
> ->d_hash(...)
> ->d_compare(...)
> ->d_revalidate(..., LOOKUP_RCU | ...)
> ->d_manage(..., true)
> ->permission(..., MAY_NOT_BLOCK | MAY_EXEC)
> Filesystem needs to be able to survive those during shutdown.  The stuff
> needed for that should _not_ be freed without synchronize_rcu() (or via
> call_rcu()); usually ->s_fs_info is involved (when anything is needed
> at all).  In any case, we shouldn't allow rmmod without making sure that
> everything in RCU mode has run out, but most of the filesystems have
> rcu_barrier() in their exit_module anyway.

... and unregister_filesystem() has synchronize_rcu(), which leaves takes
care of everything other than callbacks in fs code fed to call_rcu().
So the things are even less painful.

> 	* __put_super() probably ought to delay actual freeing via
> call_rcu(); might not be strictly necessary, but probably a good idea
> anyway.
> 	* shrink_dcache_for_umount() ought to use d_walk(), a-la
> shrink_dcache_parent().
> 
> Note that most of the filesystems don't have any of these methods or
> don't look at anything outside of inode/dentry involved in RCU case.
> Zoo:
> 
> * adfs: has the name length limit in fs-private part of superblock; used
> by ->d_hash() and ->d_compare().  No other methods involved, synchronize_rcu()
> before doing kfree() in adfs_put_super() will suffice.
> 
> * autofs4: wants fs-private part of superblock in ->d_manage().
> synchronize_rcu() in autofs4_kill_sb() would do it, or we could delay
> freeing that sucker via call_rcu() (in that case we want delayed
> freeing in __put_super() as well).
> 
> * btrfs: wants btrfs_root_readonly(BTRFS_I(inode)->root) usable in
> ->permission().  Delayed freeing of struct btrfs_root, perhaps?
> 
> * cifs: wants nls, refered to from fs-private part of superblock.
> ->permission() wants fs-private part of superblock as well.  Just
> synchronize_rcu() before unload_nls() in cifs_umount()...
> 
> * fat: same situation as with cifs
> 
> * fuse: delayed freeing of struct fuse_conn?  BTW, Miklos, just what is
>         } else if (mask & (MAY_ACCESS | MAY_CHDIR)) {
>                 if (mask & MAY_NOT_BLOCK)
>                         return -ECHILD;
> about, when we never pass such combinations?  Oh, well...
> 
> * hpfs: similar to cifs and fat, only without use of nls (a homegrown table
> of some sort).
> 
> * ncpfs: _probably_ similar to cifs et.al., but there might be dragons
> 
> * procfs: delayed freeing of pid_namespace?
> 
> * lustre: messy, haven't looked through that.

Extremely preliminary version is in vfs.git #experimental.  No fs-specific
fixes mentioned above are in there; relevant stuff is in the end of
queue -
      initialize namespace_sem statically
      fs_is_visible only needs namespace_sem held shared
      dup_mnt_ns(): get rid of pointless grabbing of vfsmount_lock
      do_remount(): pull touch_mnt_namespace() up
      fold mntfree() into mntput_no_expire()
      fs/namespace.c: bury long-dead define
      finish_automount() doesn't need vfsmount_lock for removal from expiry list
      mnt_set_expiry() doesn't need vfsmount_lock
      fold dup_mnt_ns() into its only surviving caller
      namespace.c: get rid of mnt_ghosts
      don't bother with vfsmount_lock in mounts_poll()
      new helpers: lock_mount_hash/unlock_mount_hash
      isofs: don't pass dentry to isofs_hash{i,}_common()
      uninline destroy_super(), consolidate alloc_super()
      split __lookup_mnt() in two functions
      move taking vfsmount_lock down into prepend_path()
      RCU'd vfsmounts
 fs/dcache.c           |  221 +++++++++++++----------------
 fs/internal.h         |    4 -
 fs/isofs/inode.c      |   12 +-
 fs/mount.h            |   20 +++-
 fs/namei.c            |   87 +++++------
 fs/namespace.c        |  386 +++++++++++++++++++++++++------------------------
 fs/pnode.c            |   13 +-
 fs/proc_namespace.c   |    8 +-
 fs/super.c            |  206 +++++++++++---------------
 include/linux/mount.h |    2 +
 include/linux/namei.h |    2 +-
 11 files changed, 455 insertions(+), 506 deletions(-)

With fs fixes added it'll probably still end up with slightly negative
balance...

It's barely tested (i.e. no beating for races, etc. - boots and shuts down
without any apparent problems, but that's it) and the last commit almost
certainly needs a splitup.  Everything prior to it can probably go into
-next and I hope to carve standalone pieces from it as well.  Review and
comments are welcome; personally, I prefer to use git fetch for review,
but if somebody prefers posted mailbomb, yell and I'll send it...

  reply	other threads:[~2013-10-02  1:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-28 20:27 [rfc][possible solution] RCU vfsmounts Al Viro
2013-09-28 20:43 ` Linus Torvalds
2013-09-29  6:06   ` Al Viro
2013-09-29 17:19     ` Linus Torvalds
2013-09-29 18:10       ` Al Viro
2013-09-29 18:26         ` Linus Torvalds
2013-09-30 10:48           ` Miklos Szeredi
2013-09-29 18:49         ` Al Viro
2013-09-29 19:04         ` Al Viro
2013-09-30 19:49         ` Al Viro
2013-10-02  1:30           ` Al Viro [this message]
2013-10-03  6:14           ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131002013013.GA11680@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.