linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [rfc][possible solution] RCU vfsmounts
@ 2013-09-28 20:27 Al Viro
  2013-09-28 20:43 ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2013-09-28 20:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-kernel

	FWIW, I think I have a kinda-sorta solution for that and I'd like
to hear your comments on that.  I want to replace vfsmount_lock with seqlock
and store additional seq number in nameidata, set to vfsmount_seq in the
beginning and rechecked in unlazy_walk/complete_walk.

	The obvious variant would be to have unlazy_walk/complete_walk to
grab refcount, check vfsmount_seq and mntput on mismatch.  The trouble
with that is race with what would've been the final mntput() done by
umount(2); complete_walk() would drop that temporary reference and
fail, all right, but... we would get a umount(2) returning without having
actually shut the filesystem down.  Said shutdown would happen in whoever
had been doing pathname resolution that stepped into the race.

	I _think_ I have a workable variant:
	* new vfsmount flag (MNT_SYNC_UMOUNT or something like that) and
ability to tell umount_tree() to set that on all victims; done on
non-lazy umount and on expiry.  Never cleared once set, and set only
when propagate_mount_busy() has been called and returned true.
Set before bumping vfsmount_seq.
	* rcu_barrier() added in namespace_unlock(), between
dropping namespace_sem and doing mntput() on the victims.
	* unlazy_walk() and complete_walk() use the common helper along
the lines of

legitimize_mnt(struct vfsmount *mnt, unsigned seq)
{
	if (read_seqcount_retry(&vfsmount_seq, seq)) {
		rcu_read_unlock();
		return false;
	}
	mntget(mnt);
	if (!read_seqcount_retry(&vfsmount_seq, seq)) {
		rcu_read_unlock();
		return true;
	}
	if (mnt->mnt_flags & MNT_SYNC_UMOUNT) {
		/* it couldn't have gotten through rcu_barrier() yet */
		mnt_add_count(real_mount(mnt), -1);
		rcu_read_unlock();
		return false;
	}
	rcu_read_unlock();
	mntput(mnt);
	return false;
}

Freeing vfsmounts would be done with rcu delay, vfsmount hash lookups,
d_path(), etc. do the obvious things as we do with rename_lock for dentry
side of things - that stuff is all obvious.  Not ending up with final
mntput() stolen from something that really expects it to be final is the
hard part and it looks like the above would be a solution.

Comments?  AFAICS, that would've killed *all* vfsmount-related locked stores
in RCU-mode pathwalks...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-10-03  6:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-28 20:27 [rfc][possible solution] RCU vfsmounts Al Viro
2013-09-28 20:43 ` Linus Torvalds
2013-09-29  6:06   ` Al Viro
2013-09-29 17:19     ` Linus Torvalds
2013-09-29 18:10       ` Al Viro
2013-09-29 18:26         ` Linus Torvalds
2013-09-30 10:48           ` Miklos Szeredi
2013-09-29 18:49         ` Al Viro
2013-09-29 19:04         ` Al Viro
2013-09-30 19:49         ` Al Viro
2013-10-02  1:30           ` Al Viro
2013-10-03  6:14           ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).