linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [rfc][possible solution] RCU vfsmounts
Date: Sat, 28 Sep 2013 21:27:29 +0100	[thread overview]
Message-ID: <20130928202728.GK13318@ZenIV.linux.org.uk> (raw)

	FWIW, I think I have a kinda-sorta solution for that and I'd like
to hear your comments on that.  I want to replace vfsmount_lock with seqlock
and store additional seq number in nameidata, set to vfsmount_seq in the
beginning and rechecked in unlazy_walk/complete_walk.

	The obvious variant would be to have unlazy_walk/complete_walk to
grab refcount, check vfsmount_seq and mntput on mismatch.  The trouble
with that is race with what would've been the final mntput() done by
umount(2); complete_walk() would drop that temporary reference and
fail, all right, but... we would get a umount(2) returning without having
actually shut the filesystem down.  Said shutdown would happen in whoever
had been doing pathname resolution that stepped into the race.

	I _think_ I have a workable variant:
	* new vfsmount flag (MNT_SYNC_UMOUNT or something like that) and
ability to tell umount_tree() to set that on all victims; done on
non-lazy umount and on expiry.  Never cleared once set, and set only
when propagate_mount_busy() has been called and returned true.
Set before bumping vfsmount_seq.
	* rcu_barrier() added in namespace_unlock(), between
dropping namespace_sem and doing mntput() on the victims.
	* unlazy_walk() and complete_walk() use the common helper along
the lines of

legitimize_mnt(struct vfsmount *mnt, unsigned seq)
{
	if (read_seqcount_retry(&vfsmount_seq, seq)) {
		rcu_read_unlock();
		return false;
	}
	mntget(mnt);
	if (!read_seqcount_retry(&vfsmount_seq, seq)) {
		rcu_read_unlock();
		return true;
	}
	if (mnt->mnt_flags & MNT_SYNC_UMOUNT) {
		/* it couldn't have gotten through rcu_barrier() yet */
		mnt_add_count(real_mount(mnt), -1);
		rcu_read_unlock();
		return false;
	}
	rcu_read_unlock();
	mntput(mnt);
	return false;
}

Freeing vfsmounts would be done with rcu delay, vfsmount hash lookups,
d_path(), etc. do the obvious things as we do with rename_lock for dentry
side of things - that stuff is all obvious.  Not ending up with final
mntput() stolen from something that really expects it to be final is the
hard part and it looks like the above would be a solution.

Comments?  AFAICS, that would've killed *all* vfsmount-related locked stores
in RCU-mode pathwalks...

             reply	other threads:[~2013-09-28 20:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-28 20:27 Al Viro [this message]
2013-09-28 20:43 ` [rfc][possible solution] RCU vfsmounts Linus Torvalds
2013-09-29  6:06   ` Al Viro
2013-09-29 17:19     ` Linus Torvalds
2013-09-29 18:10       ` Al Viro
2013-09-29 18:26         ` Linus Torvalds
2013-09-30 10:48           ` Miklos Szeredi
2013-09-29 18:49         ` Al Viro
2013-09-29 19:04         ` Al Viro
2013-09-30 19:49         ` Al Viro
2013-10-02  1:30           ` Al Viro
2013-10-03  6:14           ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130928202728.GK13318@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).