[REVIEW][PATCH 0/4] vfs: Detach mounts on unlink

Linux Container Development
 help / color / mirror / Atom feed

* [REVIEW][PATCH 0/4] vfs: Detach mounts on unlink
       [not found]                                 ` <87d2nb8dxy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-10-15 20:15                                   ` Eric W. Biederman
       [not found]                                     ` <87iowyxpci.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
       [not found]                                     ` <87d2n6xpan.fsf_-_@xmission.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-15 20:15 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linux Containers, Kernel Mailing List, Andy Lutomirski, Al Viro,
	Linux-Fsdevel, Matthias Schniedermeyer, Linus Torvalds

This patchset is an addresses two problems:
1) Not all modifications to the filesystems happen through the vfs and
   since the vfs can not cope with a mount point being unlinked or
   renamed filesystems whose modifications that do not come through the
   vfs are required to lie.

2) Through an oversight it is now possible for one unprivileged user to
   mount something on another unprivileged users dentry and make it
   impossible for the other user to unlink or rename that dentry.

It is now technically possible to easily lift the restriction on
unlinking and renaming files with mount points on them, with a
corresponding reduction in complexity of the vfs semantics.

After review it seems that there are no objections to this approach as
long as we retain the -EBUSY semantics for rmdir, unlink, and rename of
mount points in the current mount namespace.  The first patch in this
series now adds those local mount namespace restrictions.

All of the review comments should now be addressed and folded in, and
I have take a careful look and it appears what I have is now correct
and complete.  So I am posting this for one last round of review.

Al if you want to take this through the vfs tree, point me at a branch
and I will give you versions of these patches that apply cleanly there.
Otherwise I will push these patches to my userns tree as soon as all of
these patches pass review.

Eric W. Biederman (4):
      vfs: Don't allow overwriting mounts in the current mount namespace
      vfs: Keep a list of mounts on a mount point
      vfs: Add a function to lazily unmount all mounts from any dentry. v3
      vfs: Lazily remove mounts on unlinked files and directories. v2

 fs/afs/dir.c           |    3 +-
 fs/dcache.c            |   80 ++++++++++++++++++++----------------------------
 fs/fuse/dir.c          |    3 +-
 fs/gfs2/dentry.c       |    4 +--
 fs/mount.h             |    3 ++
 fs/namei.c             |   55 +++++++++++++++++++++------------
 fs/namespace.c         |   30 ++++++++++++++++++
 fs/nfs/dir.c           |    5 +--
 fs/sysfs/dir.c         |    9 +-----
 include/linux/dcache.h |    3 +-
 10 files changed, 108 insertions(+), 87 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                     ` <87iowyxpci.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-10-15 20:16                                       ` Eric W. Biederman
  2013-10-15 20:17                                       ` [REVIEW][PATCH 2/4] vfs: Keep a list of mounts on a mount point Eric W. Biederman
                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-15 20:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linux Containers, Kernel Mailing List, Andy Lutomirski, Al Viro,
	Linux-Fsdevel, Matthias Schniedermeyer, Linus Torvalds

In preparation for allowing mountpoints to be renamed and unlinked
in remote filesystems and in other mount namespaces test if on a dentry
there is a mount in the local mount namespace before allowing it to
be renamed or unlinked.

The primary motivation here are old versions of fusermount unmount
which is not safe if the a path can be renamed or unlinked while it is
verifying the mount is safe to unmount.  More recent versions are simpler
and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
in a directory owned by an arbitrary user.

Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> reports this is approach is good
enough to remove concerns about new kernels mixed with old versions
of fusermount.

A secondary motivation for restrictions here is that it removing empty
directories that have non-empty mount points on them appears to
violate the rule that rmdir can not remove empty directories.  As
Linus Torvalds pointed out this is useful for programs (like git) that
test if a directory is empty with rmdir.

Therefore this patch arranges to enforce the existing mount point
semantics for local mount namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namei.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 645268f23eb6..df7bd6e9c7b6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3547,6 +3547,20 @@ void dentry_unhash(struct dentry *dentry)
 	spin_unlock(&dentry->d_lock);
 }

+static bool covered(struct vfsmount *mnt, struct dentry *dentry)
+{
+	/* test to see if a dentry is covered with a mount in
+	 * the current mount namespace.
+	 */
+	bool is_covered;
+
+	rcu_read_lock();
+	is_covered = d_mountpoint(dentry) && __lookup_mnt(mnt, dentry, 1);
+	rcu_read_unlock();
+
+	return is_covered;
+}
+
 int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	int error = may_delete(dir, dentry, 1);
@@ -3622,6 +3636,9 @@ retry:
 		error = -ENOENT;
 		goto exit3;
 	}
+	error = -EBUSY;
+	if (covered(nd.path.mnt, dentry))
+		goto exit3;
 	error = security_path_rmdir(&nd.path, dentry);
 	if (error)
 		goto exit3;
@@ -3716,6 +3733,9 @@ retry:
 		inode = dentry->d_inode;
 		if (!inode)
 			goto slashes;
+		error = -EBUSY;
+		if (covered(nd.path.mnt, dentry))
+			goto exit2;
 		ihold(inode);
 		error = security_path_unlink(&nd.path, dentry);
 		if (error)
@@ -4164,6 +4184,11 @@ retry:
 	error = -ENOTEMPTY;
 	if (new_dentry == trap)
 		goto exit5;
+	error = -EBUSY;
+	if (covered(oldnd.path.mnt, old_dentry))
+		goto exit5;
+	if (covered(newnd.path.mnt, new_dentry))
+		goto exit5;

 	error = security_path_rename(&oldnd.path, old_dentry,
 				     &newnd.path, new_dentry);
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [REVIEW][PATCH 2/4] vfs: Keep a list of mounts on a mount point
       [not found]                                     ` <87iowyxpci.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-10-15 20:16                                       ` [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace Eric W. Biederman
@ 2013-10-15 20:17                                       ` Eric W. Biederman
       [not found]                                         ` <877gdexp9s.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-10-15 20:17                                       ` [REVIEW][PATCH 3/4] vfs: Add a function to lazily unmount all mounts from any dentry. v3 Eric W. Biederman
  2013-10-15 20:18                                       ` [REVIEW][PATCH 4/4] vfs: Lazily remove mounts on unlinked files and directories. v2 Eric W. Biederman
  3 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-15 20:17 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linux Containers, Kernel Mailing List, Andy Lutomirski, Al Viro,
	Linux-Fsdevel, Matthias Schniedermeyer, Linus Torvalds


To spot any possible problems call BUG if a mountpoint
is put when it's list of mounts is not empty.

Signed-off-by: Eric W. Biederman <ebiederman-1v8oiQdgUNlBDgjK7y7TUQ@public.gmane.org>
---
 fs/mount.h     |    2 ++
 fs/namespace.c |    6 ++++++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 64a858143ff9..e4342b8dfab1 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -21,6 +21,7 @@ struct mnt_pcp {
 struct mountpoint {
 	struct list_head m_hash;
 	struct dentry *m_dentry;
+	struct list_head m_list;
 	int m_count;
 };
 
@@ -47,6 +48,7 @@ struct mount {
 	struct mount *mnt_master;	/* slave is on master->mnt_slave_list */
 	struct mnt_namespace *mnt_ns;	/* containing namespace */
 	struct mountpoint *mnt_mp;	/* where is it mounted */
+	struct list_head mnt_mp_list;	/* list mounts with the same mountpoint */
 #ifdef CONFIG_FSNOTIFY
 	struct hlist_head mnt_fsnotify_marks;
 	__u32 mnt_fsnotify_mask;
diff --git a/fs/namespace.c b/fs/namespace.c
index da5c49483430..e4fe22c23e00 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -197,6 +197,7 @@ static struct mount *alloc_vfsmnt(const char *name)
 		INIT_LIST_HEAD(&mnt->mnt_share);
 		INIT_LIST_HEAD(&mnt->mnt_slave_list);
 		INIT_LIST_HEAD(&mnt->mnt_slave);
+		INIT_LIST_HEAD(&mnt->mnt_mp_list);
 #ifdef CONFIG_FSNOTIFY
 		INIT_HLIST_HEAD(&mnt->mnt_fsnotify_marks);
 #endif
@@ -636,6 +637,7 @@ static struct mountpoint *new_mountpoint(struct dentry *dentry)
 	mp->m_dentry = dentry;
 	mp->m_count = 1;
 	list_add(&mp->m_hash, chain);
+	INIT_LIST_HEAD(&mp->m_list);
 	return mp;
 }
 
@@ -643,6 +645,7 @@ static void put_mountpoint(struct mountpoint *mp)
 {
 	if (!--mp->m_count) {
 		struct dentry *dentry = mp->m_dentry;
+		BUG_ON(!list_empty(&mp->m_list));
 		spin_lock(&dentry->d_lock);
 		dentry->d_flags &= ~DCACHE_MOUNTED;
 		spin_unlock(&dentry->d_lock);
@@ -689,6 +692,7 @@ static void detach_mnt(struct mount *mnt, struct path *old_path)
 	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
 	list_del_init(&mnt->mnt_child);
 	list_del_init(&mnt->mnt_hash);
+	list_del_init(&mnt->mnt_mp_list);
 	put_mountpoint(mnt->mnt_mp);
 	mnt->mnt_mp = NULL;
 }
@@ -705,6 +709,7 @@ void mnt_set_mountpoint(struct mount *mnt,
 	child_mnt->mnt_mountpoint = dget(mp->m_dentry);
 	child_mnt->mnt_parent = mnt;
 	child_mnt->mnt_mp = mp;
+	list_add_tail(&child_mnt->mnt_mp_list, &mp->m_list);
 }
 
 /*
@@ -1191,6 +1196,7 @@ void umount_tree(struct mount *mnt, int propagate)
 		list_del_init(&p->mnt_child);
 		if (mnt_has_parent(p)) {
 			p->mnt_parent->mnt_ghosts++;
+			list_del_init(&p->mnt_mp_list);
 			put_mountpoint(p->mnt_mp);
 			p->mnt_mp = NULL;
 		}
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [REVIEW][PATCH 3/4] vfs: Add a function to lazily unmount all mounts from any dentry. v3
       [not found]                                     ` <87iowyxpci.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-10-15 20:16                                       ` [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace Eric W. Biederman
  2013-10-15 20:17                                       ` [REVIEW][PATCH 2/4] vfs: Keep a list of mounts on a mount point Eric W. Biederman
@ 2013-10-15 20:17                                       ` Eric W. Biederman
       [not found]                                         ` <871u3mxp8s.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-10-15 20:18                                       ` [REVIEW][PATCH 4/4] vfs: Lazily remove mounts on unlinked files and directories. v2 Eric W. Biederman
  3 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-15 20:17 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linux Containers, Kernel Mailing List, Andy Lutomirski, Al Viro,
	Linux-Fsdevel, Matthias Schniedermeyer, Linus Torvalds


v2: Always drop the lock when exiting early.
v3: Make detach_mounts robust about freeing several
    mounts on the same mountpoint at one time, and remove
    the unneeded mnt_list list test.

Signed-off-by: Eric W. Biederman <ebiederman-1v8oiQdgUNlBDgjK7y7TUQ@public.gmane.org>
---
 fs/mount.h     |    1 +
 fs/namespace.c |   24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index e4342b8dfab1..7a6a2bb3f290 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -79,6 +79,7 @@ static inline int is_mounted(struct vfsmount *mnt)
 }
 
 extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *, int);
+extern void detach_mounts(struct dentry *dentry);
 
 static inline void get_mnt_ns(struct mnt_namespace *ns)
 {
diff --git a/fs/namespace.c b/fs/namespace.c
index e4fe22c23e00..78f7c5c9e673 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1295,6 +1295,30 @@ static int do_umount(struct mount *mnt, int flags)
 	return retval;
 }
 
+void detach_mounts(struct dentry *dentry)
+{
+	struct mountpoint *mp;
+	struct mount *mnt;
+
+	namespace_lock();
+	if (!d_mountpoint(dentry))
+		goto out_unlock;
+
+	mp = new_mountpoint(dentry);
+	if (IS_ERR(mp))
+		goto out_unlock;
+
+	br_write_lock(&vfsmount_lock);
+	while (!list_empty(&mp->m_list)) {
+		mnt = list_first_entry(&mp->m_list, struct mount, mnt_mp_list);
+		umount_tree(mnt, 1);
+	}
+	br_write_unlock(&vfsmount_lock);
+	put_mountpoint(mp);
+out_unlock:
+	namespace_unlock();
+}
+
 /* 
  * Is the caller allowed to modify his namespace?
  */
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [REVIEW][PATCH 4/4] vfs: Lazily remove mounts on unlinked files and directories. v2
       [not found]                                     ` <87iowyxpci.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
                                                         ` (2 preceding siblings ...)
  2013-10-15 20:17                                       ` [REVIEW][PATCH 3/4] vfs: Add a function to lazily unmount all mounts from any dentry. v3 Eric W. Biederman
@ 2013-10-15 20:18                                       ` Eric W. Biederman
       [not found]                                         ` <87vc0ywan7.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  3 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-15 20:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linux Containers, Kernel Mailing List, Andy Lutomirski, Al Viro,
	Linux-Fsdevel, Matthias Schniedermeyer, Linus Torvalds


With the introduction of mount namespaces and bind mounts it became
possible to access files and directories that on some paths are mount
points but are not mount points on other paths.  It is very confusing
when rm -rf somedir returns -EBUSY simply because somedir is mounted
somewhere else.  With the addition of user namespaces allowing
unprivileged mounts this condition has gone from annoying to allowing
a DOS attack on other users in the system.

The possibility for mischief is removed by updating the vfs to support
rename, unlink and rmdir on a dentry that is a mountpoint and by
lazily unmounting mountpoints on deleted dentries.

In particular this change allows rename, unlink and rmdir system calls
on a dentry without a mountpoint in the current mount namespace to
succeed, and it allows rename, unlink, and rmdir performed on a
distributed filesystem to update the vfs cache even if when there is a
mount in some namespace on the original dentry.

There are two common patterns of maintaining mounts: Mounts on trusted
paths with the parent directory of the mount point and all ancestory
directories up to / owned by root and modifiable only by root
(i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu,
cpuacct, ...}, /usr, /usr/local).  Mounts on unprivileged directories
maintained by fusermount.

In the case of mounts in trusted directories owned by root and
modifiable only by root the current parent directory permissions are
sufficient to ensure a mount point on a trusted path is not removed
or renamed by anyone other than root, even if there is a context
where the there are no mount points to prevent this.

In the case of mounts in directories owned by less privileged users
races with users modifying the path of a mount point are already a
danger.  fusermount already uses a combination of chdir,
/proc/<pid>/fd/NNN, and UMOUNT_NOFOLLOW to prevent these races.  The
removable of global rename, unlink, and rmdir protection really adds
nothing new to consider only a widening of the attack window, and
fusermount is already safe against unprivileged users modifying the
directory simultaneously.

In principle for perfect userspace programs returning -EBUSY for
unlink, rmdir, and rename of dentires that have mounts in the local
namespace is actually unnecessary.  Unfortunately not all userspace
programs are perfect so retaining -EBUSY for unlink, rmdir and rename
of dentries that have mounts in the current mount namespace plays an
important role of maintaining consistency with historical behavior and
making imperfect userspace applications hard to exploit.

v2: Remove spurious old_dentry.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/afs/dir.c           |    3 +-
 fs/dcache.c            |   80 ++++++++++++++++++++----------------------------
 fs/fuse/dir.c          |    3 +-
 fs/gfs2/dentry.c       |    4 +--
 fs/namei.c             |   30 ++++++------------
 fs/nfs/dir.c           |    5 +--
 fs/sysfs/dir.c         |    9 +-----
 include/linux/dcache.h |    3 +-
 8 files changed, 50 insertions(+), 87 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 646337dc5201..7fb69d45f1b9 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -686,8 +686,7 @@ not_found:
 
 out_bad:
 	/* don't unhash if we have submounts */
-	if (check_submounts_and_drop(dentry) != 0)
-		goto out_skip;
+	shrink_submounts_and_drop(dentry);
 
 	_debug("dropping dentry %s/%s",
 	       parent->d_name.name, dentry->d_name.name);
diff --git a/fs/dcache.c b/fs/dcache.c
index 41000305d716..1e9bf96b0132 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1373,7 +1373,7 @@ int d_set_mounted(struct dentry *dentry)
 	int ret = -ENOENT;
 	write_seqlock(&rename_lock);
 	for (p = dentry->d_parent; !IS_ROOT(p); p = p->d_parent) {
-		/* Need exclusion wrt. check_submounts_and_drop() */
+		/* Need exclusion wrt. shrink_submounts_and_drop() */
 		spin_lock(&p->d_lock);
 		if (unlikely(d_unhashed(p))) {
 			spin_unlock(&p->d_lock);
@@ -1478,70 +1478,56 @@ void shrink_dcache_parent(struct dentry *parent)
 }
 EXPORT_SYMBOL(shrink_dcache_parent);
 
-static enum d_walk_ret check_and_collect(void *_data, struct dentry *dentry)
+struct detach_data {
+	struct dentry *found;
+};
+static enum d_walk_ret do_detach_submounts(void *ptr, struct dentry *dentry)
 {
-	struct select_data *data = _data;
-
-	if (d_mountpoint(dentry)) {
-		data->found = -EBUSY;
-		return D_WALK_QUIT;
-	}
-
-	return select_collect(_data, dentry);
-}
+	struct detach_data *data = ptr;
 
-static void check_and_drop(void *_data)
-{
-	struct select_data *data = _data;
+	if (d_mountpoint(dentry))
+		data->found = dentry;
 
-	if (d_mountpoint(data->start))
-		data->found = -EBUSY;
-	if (!data->found)
-		__d_drop(data->start);
+	return data->found ? D_WALK_QUIT : D_WALK_CONTINUE;
 }
 
 /**
- * check_submounts_and_drop - prune dcache, check for submounts and drop
+ * detach_submounts - check for submounts and detach them.
  *
- * All done as a single atomic operation relative to has_unlinked_ancestor().
- * Returns 0 if successfully unhashed @parent.  If there were submounts then
- * return -EBUSY.
+ * @dentry: dentry to find mount points under.
  *
- * @dentry: dentry to prune and drop
+ * If dentry or any of it's children is a mount point detach those mounts.
  */
-int check_submounts_and_drop(struct dentry *dentry)
+void detach_submounts(struct dentry *dentry)
 {
-	int ret = 0;
-
-	/* Negative dentries can be dropped without further checks */
-	if (!dentry->d_inode) {
-		d_drop(dentry);
-		goto out;
-	}
-
+	struct detach_data data;
 	for (;;) {
-		struct select_data data;
-
-		INIT_LIST_HEAD(&data.dispose);
-		data.start = dentry;
-		data.found = 0;
+		data.found = NULL;
+		d_walk(dentry, &data, do_detach_submounts, NULL);
 
-		d_walk(dentry, &data, check_and_collect, check_and_drop);
-		ret = data.found;
-
-		if (!list_empty(&data.dispose))
-			shrink_dentry_list(&data.dispose);
-
-		if (ret <= 0)
+		if (!data.found)
 			break;
 
+		detach_mounts(data.found);
 		cond_resched();
 	}
+	detach_mounts(dentry);
+}
 
-out:
-	return ret;
+/**
+ * shrink_submounts_and_drop - detach submounts, prune dcache, and drop
+ *
+ * All done as a single atomic operation reletaive to d_set_mounted().
+ *
+ * @dentry: dentry to detach, prune and drop
+ */
+void shrink_submounts_and_drop(struct dentry *dentry)
+{
+	d_drop(dentry);
+	detach_submounts(dentry);
+	shrink_dcache_parent(dentry);
 }
-EXPORT_SYMBOL(check_submounts_and_drop);
+EXPORT_SYMBOL(shrink_submounts_and_drop);
 
 /**
  * __d_alloc	-	allocate a dcache entry
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 62b43b577bfc..b1cd7b79a325 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -259,8 +259,7 @@ out:
 
 invalid:
 	ret = 0;
-	if (check_submounts_and_drop(entry) != 0)
-		ret = 1;
+	shrink_submounts_and_drop(entry);
 	goto out;
 }
 
diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
index d3a5d4e29ba5..2ecc2b873829 100644
--- a/fs/gfs2/dentry.c
+++ b/fs/gfs2/dentry.c
@@ -93,9 +93,7 @@ invalid_gunlock:
 	if (!had_lock)
 		gfs2_glock_dq_uninit(&d_gh);
 invalid:
-	if (check_submounts_and_drop(dentry) != 0)
-		goto valid;
-
+	shrink_submounts_and_drop(dentry);
 	dput(parent);
 	return 0;
 
diff --git a/fs/namei.c b/fs/namei.c
index df7bd6e9c7b6..a12c1d31d4c8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3574,10 +3574,6 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 	dget(dentry);
 	mutex_lock(&dentry->d_inode->i_mutex);
 
-	error = -EBUSY;
-	if (d_mountpoint(dentry))
-		goto out;
-
 	error = security_inode_rmdir(dir, dentry);
 	if (error)
 		goto out;
@@ -3589,6 +3585,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 
 	dentry->d_inode->i_flags |= S_DEAD;
 	dont_mount(dentry);
+	detach_mounts(dentry);
 
 out:
 	mutex_unlock(&dentry->d_inode->i_mutex);
@@ -3674,14 +3671,12 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
 		return -EPERM;
 
 	mutex_lock(&dentry->d_inode->i_mutex);
-	if (d_mountpoint(dentry))
-		error = -EBUSY;
-	else {
-		error = security_inode_unlink(dir, dentry);
+	error = security_inode_unlink(dir, dentry);
+	if (!error) {
+		error = dir->i_op->unlink(dir, dentry);
 		if (!error) {
-			error = dir->i_op->unlink(dir, dentry);
-			if (!error)
-				dont_mount(dentry);
+			dont_mount(dentry);
+			detach_mounts(dentry);
 		}
 	}
 	mutex_unlock(&dentry->d_inode->i_mutex);
@@ -4008,10 +4003,6 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 	if (target)
 		mutex_lock(&target->i_mutex);
 
-	error = -EBUSY;
-	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
-		goto out;
-
 	error = -EMLINK;
 	if (max_links && !target && new_dir != old_dir &&
 	    new_dir->i_nlink >= max_links)
@@ -4026,6 +4017,7 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 	if (target) {
 		target->i_flags |= S_DEAD;
 		dont_mount(new_dentry);
+		detach_mounts(new_dentry);
 	}
 out:
 	if (target)
@@ -4051,16 +4043,14 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
 	if (target)
 		mutex_lock(&target->i_mutex);
 
-	error = -EBUSY;
-	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
-		goto out;
-
 	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		goto out;
 
-	if (target)
+	if (target) {
 		dont_mount(new_dentry);
+		detach_mounts(new_dentry);
+	}
 	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
 		d_move(old_dentry, new_dentry);
 out:
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 854a8f05a610..e8e35acd8850 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1142,10 +1142,7 @@ out_zap_parent:
 		if (dentry->d_flags & DCACHE_DISCONNECTED)
 			goto out_valid;
 	}
-	/* If we have submounts, don't unhash ! */
-	if (check_submounts_and_drop(dentry) != 0)
-		goto out_valid;
-
+	shrink_submounts_and_drop(dentry);
 	dput(parent);
 	dfprintk(LOOKUPCACHE, "NFS: %s(%s/%s) is invalid\n",
 			__func__, dentry->d_parent->d_name.name,
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 4d83cedb9fcb..477c66d4e2a8 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -327,7 +327,6 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, unsigned int flags)
 	}
 
 	mutex_unlock(&sysfs_mutex);
-out_valid:
 	return 1;
 out_bad:
 	/* Remove the dentry from the dcache hashes.
@@ -341,13 +340,7 @@ out_bad:
 	 * to the dcache hashes.
 	 */
 	mutex_unlock(&sysfs_mutex);
-
-	/* If we have submounts we must allow the vfs caches
-	 * to lie about the state of the filesystem to prevent
-	 * leaks and other nasty things.
-	 */
-	if (check_submounts_and_drop(dentry) != 0)
-		goto out_valid;
+	shrink_submounts_and_drop(dentry);
 
 	return 0;
 }
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 59066e0b4ff1..17948b49f3d5 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -254,7 +254,8 @@ extern void d_prune_aliases(struct inode *);
 
 /* test whether we have any submounts in a subdir tree */
 extern int have_submounts(struct dentry *);
-extern int check_submounts_and_drop(struct dentry *);
+extern void detach_submounts(struct dentry *dentry);
+extern void shrink_submounts_and_drop(struct dentry *);
 
 /*
  * This adds the entry to the hash queues.
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                       ` <87d2n6xpan.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-10-22 19:04                                         ` Serge E. Hallyn
  2013-11-03  3:54                                         ` Al Viro
  1 sibling, 0 replies; 16+ messages in thread
From: Serge E. Hallyn @ 2013-10-22 19:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Al Viro, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> 
> In preparation for allowing mountpoints to be renamed and unlinked
> in remote filesystems and in other mount namespaces test if on a dentry
> there is a mount in the local mount namespace before allowing it to
> be renamed or unlinked.
> 
> The primary motivation here are old versions of fusermount unmount
> which is not safe if the a path can be renamed or unlinked while it is
> verifying the mount is safe to unmount.  More recent versions are simpler
> and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
> in a directory owned by an arbitrary user.
> 
> Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> reports this is approach is good
> enough to remove concerns about new kernels mixed with old versions
> of fusermount.
> 
> A secondary motivation for restrictions here is that it removing empty
> directories that have non-empty mount points on them appears to
> violate the rule that rmdir can not remove empty directories.  As
> Linus Torvalds pointed out this is useful for programs (like git) that
> test if a directory is empty with rmdir.
> 
> Therefore this patch arranges to enforce the existing mount point
> semantics for local mount namespace.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  fs/namei.c |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 645268f23eb6..df7bd6e9c7b6 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3547,6 +3547,20 @@ void dentry_unhash(struct dentry *dentry)
>  	spin_unlock(&dentry->d_lock);
>  }
>  
> +static bool covered(struct vfsmount *mnt, struct dentry *dentry)
> +{
> +	/* test to see if a dentry is covered with a mount in
> +	 * the current mount namespace.
> +	 */
> +	bool is_covered;
> +
> +	rcu_read_lock();
> +	is_covered = d_mountpoint(dentry) && __lookup_mnt(mnt, dentry, 1);
> +	rcu_read_unlock();
> +
> +	return is_covered;
> +}
> +
>  int vfs_rmdir(struct inode *dir, struct dentry *dentry)
>  {
>  	int error = may_delete(dir, dentry, 1);
> @@ -3622,6 +3636,9 @@ retry:
>  		error = -ENOENT;
>  		goto exit3;
>  	}
> +	error = -EBUSY;
> +	if (covered(nd.path.mnt, dentry))
> +		goto exit3;
>  	error = security_path_rmdir(&nd.path, dentry);
>  	if (error)
>  		goto exit3;
> @@ -3716,6 +3733,9 @@ retry:
>  		inode = dentry->d_inode;
>  		if (!inode)
>  			goto slashes;
> +		error = -EBUSY;
> +		if (covered(nd.path.mnt, dentry))
> +			goto exit2;
>  		ihold(inode);
>  		error = security_path_unlink(&nd.path, dentry);
>  		if (error)
> @@ -4164,6 +4184,11 @@ retry:
>  	error = -ENOTEMPTY;
>  	if (new_dentry == trap)
>  		goto exit5;
> +	error = -EBUSY;
> +	if (covered(oldnd.path.mnt, old_dentry))
> +		goto exit5;
> +	if (covered(newnd.path.mnt, new_dentry))
> +		goto exit5;
>  
>  	error = security_path_rename(&oldnd.path, old_dentry,
>  				     &newnd.path, new_dentry);
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 2/4] vfs: Keep a list of mounts on a mount point
       [not found]                                         ` <877gdexp9s.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-10-22 19:06                                           ` Serge E. Hallyn
  0 siblings, 0 replies; 16+ messages in thread
From: Serge E. Hallyn @ 2013-10-22 19:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Al Viro, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> 
> To spot any possible problems call BUG if a mountpoint
> is put when it's list of mounts is not empty.
> 
> Signed-off-by: Eric W. Biederman <ebiederman-1v8oiQdgUNlBDgjK7y7TUQ@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  fs/mount.h     |    2 ++
>  fs/namespace.c |    6 ++++++
>  2 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/mount.h b/fs/mount.h
> index 64a858143ff9..e4342b8dfab1 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -21,6 +21,7 @@ struct mnt_pcp {
>  struct mountpoint {
>  	struct list_head m_hash;
>  	struct dentry *m_dentry;
> +	struct list_head m_list;
>  	int m_count;
>  };
>  
> @@ -47,6 +48,7 @@ struct mount {
>  	struct mount *mnt_master;	/* slave is on master->mnt_slave_list */
>  	struct mnt_namespace *mnt_ns;	/* containing namespace */
>  	struct mountpoint *mnt_mp;	/* where is it mounted */
> +	struct list_head mnt_mp_list;	/* list mounts with the same mountpoint */
>  #ifdef CONFIG_FSNOTIFY
>  	struct hlist_head mnt_fsnotify_marks;
>  	__u32 mnt_fsnotify_mask;
> diff --git a/fs/namespace.c b/fs/namespace.c
> index da5c49483430..e4fe22c23e00 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -197,6 +197,7 @@ static struct mount *alloc_vfsmnt(const char *name)
>  		INIT_LIST_HEAD(&mnt->mnt_share);
>  		INIT_LIST_HEAD(&mnt->mnt_slave_list);
>  		INIT_LIST_HEAD(&mnt->mnt_slave);
> +		INIT_LIST_HEAD(&mnt->mnt_mp_list);
>  #ifdef CONFIG_FSNOTIFY
>  		INIT_HLIST_HEAD(&mnt->mnt_fsnotify_marks);
>  #endif
> @@ -636,6 +637,7 @@ static struct mountpoint *new_mountpoint(struct dentry *dentry)
>  	mp->m_dentry = dentry;
>  	mp->m_count = 1;
>  	list_add(&mp->m_hash, chain);
> +	INIT_LIST_HEAD(&mp->m_list);
>  	return mp;
>  }
>  
> @@ -643,6 +645,7 @@ static void put_mountpoint(struct mountpoint *mp)
>  {
>  	if (!--mp->m_count) {
>  		struct dentry *dentry = mp->m_dentry;
> +		BUG_ON(!list_empty(&mp->m_list));
>  		spin_lock(&dentry->d_lock);
>  		dentry->d_flags &= ~DCACHE_MOUNTED;
>  		spin_unlock(&dentry->d_lock);
> @@ -689,6 +692,7 @@ static void detach_mnt(struct mount *mnt, struct path *old_path)
>  	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
>  	list_del_init(&mnt->mnt_child);
>  	list_del_init(&mnt->mnt_hash);
> +	list_del_init(&mnt->mnt_mp_list);
>  	put_mountpoint(mnt->mnt_mp);
>  	mnt->mnt_mp = NULL;
>  }
> @@ -705,6 +709,7 @@ void mnt_set_mountpoint(struct mount *mnt,
>  	child_mnt->mnt_mountpoint = dget(mp->m_dentry);
>  	child_mnt->mnt_parent = mnt;
>  	child_mnt->mnt_mp = mp;
> +	list_add_tail(&child_mnt->mnt_mp_list, &mp->m_list);
>  }
>  
>  /*
> @@ -1191,6 +1196,7 @@ void umount_tree(struct mount *mnt, int propagate)
>  		list_del_init(&p->mnt_child);
>  		if (mnt_has_parent(p)) {
>  			p->mnt_parent->mnt_ghosts++;
> +			list_del_init(&p->mnt_mp_list);
>  			put_mountpoint(p->mnt_mp);
>  			p->mnt_mp = NULL;
>  		}
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 3/4] vfs: Add a function to lazily unmount all mounts from any dentry. v3
       [not found]                                         ` <871u3mxp8s.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-10-22 19:08                                           ` Serge E. Hallyn
  0 siblings, 0 replies; 16+ messages in thread
From: Serge E. Hallyn @ 2013-10-22 19:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Al Viro, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> 
> v2: Always drop the lock when exiting early.
> v3: Make detach_mounts robust about freeing several
>     mounts on the same mountpoint at one time, and remove
>     the unneeded mnt_list list test.
> 
> Signed-off-by: Eric W. Biederman <ebiederman-1v8oiQdgUNlBDgjK7y7TUQ@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  fs/mount.h     |    1 +
>  fs/namespace.c |   24 ++++++++++++++++++++++++
>  2 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/mount.h b/fs/mount.h
> index e4342b8dfab1..7a6a2bb3f290 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -79,6 +79,7 @@ static inline int is_mounted(struct vfsmount *mnt)
>  }
>  
>  extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *, int);
> +extern void detach_mounts(struct dentry *dentry);
>  
>  static inline void get_mnt_ns(struct mnt_namespace *ns)
>  {
> diff --git a/fs/namespace.c b/fs/namespace.c
> index e4fe22c23e00..78f7c5c9e673 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1295,6 +1295,30 @@ static int do_umount(struct mount *mnt, int flags)
>  	return retval;
>  }
>  
> +void detach_mounts(struct dentry *dentry)
> +{
> +	struct mountpoint *mp;
> +	struct mount *mnt;
> +
> +	namespace_lock();
> +	if (!d_mountpoint(dentry))
> +		goto out_unlock;
> +
> +	mp = new_mountpoint(dentry);
> +	if (IS_ERR(mp))
> +		goto out_unlock;
> +
> +	br_write_lock(&vfsmount_lock);
> +	while (!list_empty(&mp->m_list)) {
> +		mnt = list_first_entry(&mp->m_list, struct mount, mnt_mp_list);
> +		umount_tree(mnt, 1);
> +	}
> +	br_write_unlock(&vfsmount_lock);
> +	put_mountpoint(mp);
> +out_unlock:
> +	namespace_unlock();
> +}
> +
>  /* 
>   * Is the caller allowed to modify his namespace?
>   */
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 4/4] vfs: Lazily remove mounts on unlinked files and directories. v2
       [not found]                                         ` <87vc0ywan7.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-10-22 19:13                                           ` Serge E. Hallyn
  0 siblings, 0 replies; 16+ messages in thread
From: Serge E. Hallyn @ 2013-10-22 19:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Al Viro, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> 
> With the introduction of mount namespaces and bind mounts it became
> possible to access files and directories that on some paths are mount
> points but are not mount points on other paths.  It is very confusing
> when rm -rf somedir returns -EBUSY simply because somedir is mounted
> somewhere else.  With the addition of user namespaces allowing
> unprivileged mounts this condition has gone from annoying to allowing
> a DOS attack on other users in the system.
> 
> The possibility for mischief is removed by updating the vfs to support
> rename, unlink and rmdir on a dentry that is a mountpoint and by
> lazily unmounting mountpoints on deleted dentries.
> 
> In particular this change allows rename, unlink and rmdir system calls
> on a dentry without a mountpoint in the current mount namespace to
> succeed, and it allows rename, unlink, and rmdir performed on a
> distributed filesystem to update the vfs cache even if when there is a
> mount in some namespace on the original dentry.
> 
> There are two common patterns of maintaining mounts: Mounts on trusted
> paths with the parent directory of the mount point and all ancestory
> directories up to / owned by root and modifiable only by root
> (i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu,
> cpuacct, ...}, /usr, /usr/local).  Mounts on unprivileged directories
> maintained by fusermount.
> 
> In the case of mounts in trusted directories owned by root and
> modifiable only by root the current parent directory permissions are
> sufficient to ensure a mount point on a trusted path is not removed
> or renamed by anyone other than root, even if there is a context
> where the there are no mount points to prevent this.
> 
> In the case of mounts in directories owned by less privileged users
> races with users modifying the path of a mount point are already a
> danger.  fusermount already uses a combination of chdir,
> /proc/<pid>/fd/NNN, and UMOUNT_NOFOLLOW to prevent these races.  The
> removable of global rename, unlink, and rmdir protection really adds
> nothing new to consider only a widening of the attack window, and
> fusermount is already safe against unprivileged users modifying the
> directory simultaneously.
> 
> In principle for perfect userspace programs returning -EBUSY for
> unlink, rmdir, and rename of dentires that have mounts in the local
> namespace is actually unnecessary.  Unfortunately not all userspace
> programs are perfect so retaining -EBUSY for unlink, rmdir and rename
> of dentries that have mounts in the current mount namespace plays an
> important role of maintaining consistency with historical behavior and
> making imperfect userspace applications hard to exploit.
> 
> v2: Remove spurious old_dentry.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  fs/afs/dir.c           |    3 +-
>  fs/dcache.c            |   80 ++++++++++++++++++++----------------------------
>  fs/fuse/dir.c          |    3 +-
>  fs/gfs2/dentry.c       |    4 +--
>  fs/namei.c             |   30 ++++++------------
>  fs/nfs/dir.c           |    5 +--
>  fs/sysfs/dir.c         |    9 +-----
>  include/linux/dcache.h |    3 +-
>  8 files changed, 50 insertions(+), 87 deletions(-)
> 
> diff --git a/fs/afs/dir.c b/fs/afs/dir.c
> index 646337dc5201..7fb69d45f1b9 100644
> --- a/fs/afs/dir.c
> +++ b/fs/afs/dir.c
> @@ -686,8 +686,7 @@ not_found:
>  
>  out_bad:
>  	/* don't unhash if we have submounts */
> -	if (check_submounts_and_drop(dentry) != 0)
> -		goto out_skip;
> +	shrink_submounts_and_drop(dentry);
>  
>  	_debug("dropping dentry %s/%s",
>  	       parent->d_name.name, dentry->d_name.name);
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 41000305d716..1e9bf96b0132 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1373,7 +1373,7 @@ int d_set_mounted(struct dentry *dentry)
>  	int ret = -ENOENT;
>  	write_seqlock(&rename_lock);
>  	for (p = dentry->d_parent; !IS_ROOT(p); p = p->d_parent) {
> -		/* Need exclusion wrt. check_submounts_and_drop() */
> +		/* Need exclusion wrt. shrink_submounts_and_drop() */
>  		spin_lock(&p->d_lock);
>  		if (unlikely(d_unhashed(p))) {
>  			spin_unlock(&p->d_lock);
> @@ -1478,70 +1478,56 @@ void shrink_dcache_parent(struct dentry *parent)
>  }
>  EXPORT_SYMBOL(shrink_dcache_parent);
>  
> -static enum d_walk_ret check_and_collect(void *_data, struct dentry *dentry)
> +struct detach_data {
> +	struct dentry *found;
> +};
> +static enum d_walk_ret do_detach_submounts(void *ptr, struct dentry *dentry)
>  {
> -	struct select_data *data = _data;
> -
> -	if (d_mountpoint(dentry)) {
> -		data->found = -EBUSY;
> -		return D_WALK_QUIT;
> -	}
> -
> -	return select_collect(_data, dentry);
> -}
> +	struct detach_data *data = ptr;
>  
> -static void check_and_drop(void *_data)
> -{
> -	struct select_data *data = _data;
> +	if (d_mountpoint(dentry))
> +		data->found = dentry;
>  
> -	if (d_mountpoint(data->start))
> -		data->found = -EBUSY;
> -	if (!data->found)
> -		__d_drop(data->start);
> +	return data->found ? D_WALK_QUIT : D_WALK_CONTINUE;
>  }
>  
>  /**
> - * check_submounts_and_drop - prune dcache, check for submounts and drop
> + * detach_submounts - check for submounts and detach them.
>   *
> - * All done as a single atomic operation relative to has_unlinked_ancestor().
> - * Returns 0 if successfully unhashed @parent.  If there were submounts then
> - * return -EBUSY.
> + * @dentry: dentry to find mount points under.
>   *
> - * @dentry: dentry to prune and drop
> + * If dentry or any of it's children is a mount point detach those mounts.
>   */
> -int check_submounts_and_drop(struct dentry *dentry)
> +void detach_submounts(struct dentry *dentry)
>  {
> -	int ret = 0;
> -
> -	/* Negative dentries can be dropped without further checks */
> -	if (!dentry->d_inode) {
> -		d_drop(dentry);
> -		goto out;
> -	}
> -
> +	struct detach_data data;
>  	for (;;) {
> -		struct select_data data;
> -
> -		INIT_LIST_HEAD(&data.dispose);
> -		data.start = dentry;
> -		data.found = 0;
> +		data.found = NULL;
> +		d_walk(dentry, &data, do_detach_submounts, NULL);
>  
> -		d_walk(dentry, &data, check_and_collect, check_and_drop);
> -		ret = data.found;
> -
> -		if (!list_empty(&data.dispose))
> -			shrink_dentry_list(&data.dispose);
> -
> -		if (ret <= 0)
> +		if (!data.found)
>  			break;
>  
> +		detach_mounts(data.found);
>  		cond_resched();
>  	}
> +	detach_mounts(dentry);
> +}
>  
> -out:
> -	return ret;
> +/**
> + * shrink_submounts_and_drop - detach submounts, prune dcache, and drop
> + *
> + * All done as a single atomic operation reletaive to d_set_mounted().
> + *
> + * @dentry: dentry to detach, prune and drop
> + */
> +void shrink_submounts_and_drop(struct dentry *dentry)
> +{
> +	d_drop(dentry);
> +	detach_submounts(dentry);
> +	shrink_dcache_parent(dentry);
>  }
> -EXPORT_SYMBOL(check_submounts_and_drop);
> +EXPORT_SYMBOL(shrink_submounts_and_drop);
>  
>  /**
>   * __d_alloc	-	allocate a dcache entry
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 62b43b577bfc..b1cd7b79a325 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -259,8 +259,7 @@ out:
>  
>  invalid:
>  	ret = 0;
> -	if (check_submounts_and_drop(entry) != 0)
> -		ret = 1;
> +	shrink_submounts_and_drop(entry);
>  	goto out;
>  }
>  
> diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c
> index d3a5d4e29ba5..2ecc2b873829 100644
> --- a/fs/gfs2/dentry.c
> +++ b/fs/gfs2/dentry.c
> @@ -93,9 +93,7 @@ invalid_gunlock:
>  	if (!had_lock)
>  		gfs2_glock_dq_uninit(&d_gh);
>  invalid:
> -	if (check_submounts_and_drop(dentry) != 0)
> -		goto valid;
> -
> +	shrink_submounts_and_drop(dentry);
>  	dput(parent);
>  	return 0;
>  
> diff --git a/fs/namei.c b/fs/namei.c
> index df7bd6e9c7b6..a12c1d31d4c8 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3574,10 +3574,6 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
>  	dget(dentry);
>  	mutex_lock(&dentry->d_inode->i_mutex);
>  
> -	error = -EBUSY;
> -	if (d_mountpoint(dentry))
> -		goto out;
> -
>  	error = security_inode_rmdir(dir, dentry);
>  	if (error)
>  		goto out;
> @@ -3589,6 +3585,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
>  
>  	dentry->d_inode->i_flags |= S_DEAD;
>  	dont_mount(dentry);
> +	detach_mounts(dentry);
>  
>  out:
>  	mutex_unlock(&dentry->d_inode->i_mutex);
> @@ -3674,14 +3671,12 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  		return -EPERM;
>  
>  	mutex_lock(&dentry->d_inode->i_mutex);
> -	if (d_mountpoint(dentry))
> -		error = -EBUSY;
> -	else {
> -		error = security_inode_unlink(dir, dentry);
> +	error = security_inode_unlink(dir, dentry);
> +	if (!error) {
> +		error = dir->i_op->unlink(dir, dentry);
>  		if (!error) {
> -			error = dir->i_op->unlink(dir, dentry);
> -			if (!error)
> -				dont_mount(dentry);
> +			dont_mount(dentry);
> +			detach_mounts(dentry);
>  		}
>  	}
>  	mutex_unlock(&dentry->d_inode->i_mutex);
> @@ -4008,10 +4003,6 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
>  	if (target)
>  		mutex_lock(&target->i_mutex);
>  
> -	error = -EBUSY;
> -	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
> -		goto out;
> -
>  	error = -EMLINK;
>  	if (max_links && !target && new_dir != old_dir &&
>  	    new_dir->i_nlink >= max_links)
> @@ -4026,6 +4017,7 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
>  	if (target) {
>  		target->i_flags |= S_DEAD;
>  		dont_mount(new_dentry);
> +		detach_mounts(new_dentry);
>  	}
>  out:
>  	if (target)
> @@ -4051,16 +4043,14 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  	if (target)
>  		mutex_lock(&target->i_mutex);
>  
> -	error = -EBUSY;
> -	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
> -		goto out;
> -
>  	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
>  	if (error)
>  		goto out;
>  
> -	if (target)
> +	if (target) {
>  		dont_mount(new_dentry);
> +		detach_mounts(new_dentry);
> +	}
>  	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
>  		d_move(old_dentry, new_dentry);
>  out:
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 854a8f05a610..e8e35acd8850 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1142,10 +1142,7 @@ out_zap_parent:
>  		if (dentry->d_flags & DCACHE_DISCONNECTED)
>  			goto out_valid;
>  	}
> -	/* If we have submounts, don't unhash ! */
> -	if (check_submounts_and_drop(dentry) != 0)
> -		goto out_valid;
> -
> +	shrink_submounts_and_drop(dentry);
>  	dput(parent);
>  	dfprintk(LOOKUPCACHE, "NFS: %s(%s/%s) is invalid\n",
>  			__func__, dentry->d_parent->d_name.name,
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 4d83cedb9fcb..477c66d4e2a8 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -327,7 +327,6 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, unsigned int flags)
>  	}
>  
>  	mutex_unlock(&sysfs_mutex);
> -out_valid:
>  	return 1;
>  out_bad:
>  	/* Remove the dentry from the dcache hashes.
> @@ -341,13 +340,7 @@ out_bad:
>  	 * to the dcache hashes.
>  	 */
>  	mutex_unlock(&sysfs_mutex);
> -
> -	/* If we have submounts we must allow the vfs caches
> -	 * to lie about the state of the filesystem to prevent
> -	 * leaks and other nasty things.
> -	 */
> -	if (check_submounts_and_drop(dentry) != 0)
> -		goto out_valid;
> +	shrink_submounts_and_drop(dentry);
>  
>  	return 0;
>  }
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 59066e0b4ff1..17948b49f3d5 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -254,7 +254,8 @@ extern void d_prune_aliases(struct inode *);
>  
>  /* test whether we have any submounts in a subdir tree */
>  extern int have_submounts(struct dentry *);
> -extern int check_submounts_and_drop(struct dentry *);
> +extern void detach_submounts(struct dentry *dentry);
> +extern void shrink_submounts_and_drop(struct dentry *);
>  
>  /*
>   * This adds the entry to the hash queues.
> -- 
> 1.7.5.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                       ` <87d2n6xpan.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-10-22 19:04                                         ` [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace Serge E. Hallyn
@ 2013-11-03  3:54                                         ` Al Viro
  1 sibling, 0 replies; 16+ messages in thread
From: Al Viro @ 2013-11-03  3:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

On Tue, Oct 15, 2013 at 01:16:48PM -0700, Eric W. Biederman wrote:

>  int vfs_rmdir(struct inode *dir, struct dentry *dentry)
>  {
>  	int error = may_delete(dir, dentry, 1);
> @@ -3622,6 +3636,9 @@ retry:
>  		error = -ENOENT;
>  		goto exit3;
>  	}
> +	error = -EBUSY;
> +	if (covered(nd.path.mnt, dentry))
> +		goto exit3;

Ugh...  And it's not racy because of...?  IOW, what's to keep the return
value of covered() from getting obsolete just as it's being calculated,
let alone returned?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                         ` <20131103035406.GA8537-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2013-11-08 20:51                                           ` Eric W. Biederman
       [not found]                                             ` <87bo1u8vmf.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
       [not found]                                             ` <20131108213551.GR13318@ZenIV.linux.org.uk>
  2013-11-21 20:49                                           ` Eric W. Biederman
  1 sibling, 2 replies; 16+ messages in thread
From: Eric W. Biederman @ 2013-11-08 20:51 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> writes:

> On Tue, Oct 15, 2013 at 01:16:48PM -0700, Eric W. Biederman wrote:
>
>>  int vfs_rmdir(struct inode *dir, struct dentry *dentry)
>>  {
>>  	int error = may_delete(dir, dentry, 1);
>> @@ -3622,6 +3636,9 @@ retry:
>>  		error = -ENOENT;
>>  		goto exit3;
>>  	}
>> +	error = -EBUSY;
>> +	if (covered(nd.path.mnt, dentry))
>> +		goto exit3;
>
> Ugh...  And it's not racy because of...?  IOW, what's to keep the return
> value of covered() from getting obsolete just as it's being calculated,
> let alone returned?

The return value of d_mountpoint can be obsolete as soon as it returns
as well, so I don't see this as being significantly different.

I would like to say that any changes introduced here do not matter
because all of this is just to keep a semblance of the old semantics.
Unfortunately for me part of keeping that semblance is as much as is
reasonable preserving the existing race guarantees.

In 3.12 we create a mount with:
- The dentry->d_inode mutex held.
- The namespace_sem held.

In 3.12 we remove a mount with just the namespace_sem held.

I call covered in: do_rmdir, do_unlinkat, and renameat.

In 3.12 vfs_rmdir checks d_mountpoint with the
dentry->d_inode->i_mutex and
dentry->d_parent->d_inode->i_mutex held.

In 3.12 vfs_unlink checks d_mountpoint with the
dentry->d_inode->i_mutex and
dentry->d_parent->d_inode->i_mutex hel.d

In 3.12 vfs_rename_dir and vfs_rename_other checks d_mountpint with the
target->i_mutex, new_dir->i_mutex, and old_dir->i_mutex held.

Therefore the guarantees in 3.12 are:
- unlink versus mount races are prevented by the
  dentry->d_inode->i_mutex of the dentry being removed.
- unlink versus umount races are uninteresting.
- mount versus rename races in testing of d_mountpoint are ignored.
- umount versus rename races in testing of d_mountpoint are ignored.

So comparing this to how I have implemented covered the test is at a
slightly different location in the call path so there may be a slightly
larger race in rename.

For unlink there is a race where the mount could happen after testing
covered.  Then the unlink happens.  Then we remove the mount with
detach_mounts.

In the context of the symlink attacks against umounting of fuse I don't
see a difference.

In the only case where there is a new race (unlink versus mount) I see
a narrow window where new behavior will happen the unlink will win and
we unmount the filesystem.  So there is a vary narrow window in which
we might have a stale entry in /etc/mtab.

So after all of that analysis I don't think we care.  If we do care with
a little more work we can pass the mountpoint down and test covered with
dentry->d_inode->i_mutex held, where we test d_mountpoint in 3.12 today.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                             ` <87bo1u8vmf.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-11-08 21:35                                               ` Al Viro
  0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2013-11-08 21:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

On Fri, Nov 08, 2013 at 12:51:52PM -0800, Eric W. Biederman wrote:

> The return value of d_mountpoint can be obsolete as soon as it returns
> as well, so I don't see this as being significantly different.

Not if the ->i_mutex of that sucker is held.  And it *is* held in
vfs_unlink/vfs_rmdir/vfs_rename.  Note that we only care about a mountpoint
being falsely assumed to be a non-mountpoint - in the other direction we
can just shrug and say that we'd won the race and got EBUSY for that.

> In 3.12 vfs_rmdir checks d_mountpoint with the
> dentry->d_inode->i_mutex and
> dentry->d_parent->d_inode->i_mutex held.
> 
> In 3.12 vfs_unlink checks d_mountpoint with the
> dentry->d_inode->i_mutex and
> dentry->d_parent->d_inode->i_mutex hel.d
> 
> In 3.12 vfs_rename_dir and vfs_rename_other checks d_mountpint with the
> target->i_mutex, new_dir->i_mutex, and old_dir->i_mutex held.
> 
> 
> Therefore the guarantees in 3.12 are:
> - unlink versus mount races are prevented by the
>   dentry->d_inode->i_mutex of the dentry being removed.
> - unlink versus umount races are uninteresting.
> - mount versus rename races in testing of d_mountpoint are ignored.

Read what you've written a few lines above.  The part about target->i_mutex
being held.

> So comparing this to how I have implemented covered the test is at a
> slightly different location in the call path so there may be a slightly
> larger race in rename.

You've got a race in unlink.  You've got a race in rename.  You've got a race
in rmdir.  And none of those had that race in 3.12 (including rename()).

BTW, could you describe the races with umount in a bit more details?  Races
with mount are simple - rmdir() sees that victim isn't a mountpoint and
proceeds, mount() sees that victim is still alive and proceeds, despite
the fact that victim is irretrievably on the way to removal.  And that's
what ->i_mutex on victim prevents, making "check for d_mountpoint / remove /
call dont_mount()" atomic wrt mount().  What is the problem you are seeing
with umount()?  rmdir() getting EBUSY because it hasn't noticed umount()
happening in parallel with it?  Legitimate behaviour, as far I can see...
Or is it about something different?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                               ` <20131108213551.GR13318-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2013-11-08 22:17                                                 ` Eric W. Biederman
       [not found]                                                   ` <87fvr61qtg.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2013-11-08 22:17 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> writes:

> On Fri, Nov 08, 2013 at 12:51:52PM -0800, Eric W. Biederman wrote:
>
>> The return value of d_mountpoint can be obsolete as soon as it returns
>> as well, so I don't see this as being significantly different.
>
> Not if the ->i_mutex of that sucker is held.  And it *is* held in
> vfs_unlink/vfs_rmdir/vfs_rename.  Note that we only care about a mountpoint
> being falsely assumed to be a non-mountpoint - in the other direction we
> can just shrug and say that we'd won the race and got EBUSY for that.

I wasn't certain of your question.  My point here was that covered() as
a mechanism is as good as d_mountpoint.  So the only potential issue
with covered() as a mechanism is where covered() is called.

Also please note old_dentry->d_inode->i_mutex is not held in rename.

>> In 3.12 vfs_rmdir checks d_mountpoint with the
>> dentry->d_inode->i_mutex and
>> dentry->d_parent->d_inode->i_mutex held.
>> 
>> In 3.12 vfs_unlink checks d_mountpoint with the
>> dentry->d_inode->i_mutex and
>> dentry->d_parent->d_inode->i_mutex hel.d
>> 
>> In 3.12 vfs_rename_dir and vfs_rename_other checks d_mountpint with the
>> target->i_mutex, new_dir->i_mutex, and old_dir->i_mutex held.
>> 
>> 
>> Therefore the guarantees in 3.12 are:
>> - unlink versus mount races are prevented by the
>>   dentry->d_inode->i_mutex of the dentry being removed.
>> - unlink versus umount races are uninteresting.
>> - mount versus rename races in testing of d_mountpoint are ignored.
>
> Read what you've written a few lines above.  The part about target->i_mutex
> being held.

That works for the rename as unlink case but we don't hold
old_dentry->d_inode->i_mutex which is what is needed to prevent a mount
on the dentry we are renaming.

>> So comparing this to how I have implemented covered the test is at a
>> slightly different location in the call path so there may be a slightly
>> larger race in rename.
>
> You've got a race in unlink.  You've got a race in rename.  You've got a race
> in rmdir.  And none of those had that race in 3.12 (including rename()).

Rename absolutely has a race in 3.12.  With very lucky timing it is
possible to mount something on directory a, simultaneously rename
a to b, and have the mount show up on b.

> BTW, could you describe the races with umount in a bit more details?  Races
> with mount are simple - rmdir() sees that victim isn't a mountpoint and
> proceeds, mount() sees that victim is still alive and proceeds, despite
> the fact that victim is irretrievably on the way to removal.  And that's
> what ->i_mutex on victim prevents, making "check for d_mountpoint / remove /
> call dont_mount()" atomic wrt mount().  What is the problem you are seeing
> with umount()?  rmdir() getting EBUSY because it hasn't noticed umount()
> happening in parallel with it?  Legitimate behaviour, as far I can see...
> Or is it about something different?

I did not say it was a problem only that it was a race.  The only case I
can see is getting a state EBUSY, and I see no problem with a that.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                                   ` <87fvr61qtg.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-11-09  8:49                                                     ` Christoph Hellwig
       [not found]                                                       ` <20131109084916.GA21413-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2013-11-09  8:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Al Viro, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

On Fri, Nov 08, 2013 at 02:17:31PM -0800, Eric W. Biederman wrote:
> > Read what you've written a few lines above.  The part about target->i_mutex
> > being held.
> 
> That works for the rename as unlink case but we don't hold
> old_dentry->d_inode->i_mutex which is what is needed to prevent a mount
> on the dentry we are renaming.

It will be held in 3.13.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                         ` <20131103035406.GA8537-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  2013-11-08 20:51                                           ` Eric W. Biederman
@ 2013-11-21 20:49                                           ` Eric W. Biederman
  1 sibling, 0 replies; 16+ messages in thread
From: Eric W. Biederman @ 2013-11-21 20:49 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org> writes:

> On Tue, Oct 15, 2013 at 01:16:48PM -0700, Eric W. Biederman wrote:
>
>>  int vfs_rmdir(struct inode *dir, struct dentry *dentry)
>>  {
>>  	int error = may_delete(dir, dentry, 1);
>> @@ -3622,6 +3636,9 @@ retry:
>>  		error = -ENOENT;
>>  		goto exit3;
>>  	}
>> +	error = -EBUSY;
>> +	if (covered(nd.path.mnt, dentry))
>> +		goto exit3;
>
> Ugh...  And it's not racy because of...?  IOW, what's to keep the return
> value of covered() from getting obsolete just as it's being calculated,
> let alone returned?

I have been fighting a cold off and on so I have been taking much longer
to dig through all of these issues than I would like.

Aftering having thought through all of the issues I completely agree
that this is a racy bug that needs to be fixed.

The fix needs to be holding i_mutex of the parent directory in do_mount,
and pivot_root.

We need to hold i_mutex in do_mount and pivot_root not because of this
issue but to prevent mount points being renamed before we mount on them.

With todays kernel because of races between when we lookup a mount point
and when we take locks, and which locks we take.  When mount(2) returns
the mount point can be located anywhere.  Which completely defeats
returning -EBUSY mount points from a userspace semantics perspective.

I was really hoping I could think through this and say that this was a
trivial issue that would allow my patches to be good for 3.13.  But that
is clearly not the case.  kern_path_locked(...,LOOKUP_FOLLOW,...) is
non-trivial to implement, and there are issues like having to move
get_fs_type before we take any locks to prevent deadlocks.

I almost have the issues worked through, so hopefully I can send a
rebased set of patches in a few days.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
       [not found]                                                       ` <20131109084916.GA21413-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2013-11-21 20:58                                                         ` Eric W. Biederman
  0 siblings, 0 replies; 16+ messages in thread
From: Eric W. Biederman @ 2013-11-21 20:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miklos Szeredi, Linux Containers, Kernel Mailing List,
	Andy Lutomirski, Al Viro, Linux-Fsdevel, Matthias Schniedermeyer,
	Linus Torvalds

Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> writes:

> On Fri, Nov 08, 2013 at 02:17:31PM -0800, Eric W. Biederman wrote:
>> > Read what you've written a few lines above.  The part about target->i_mutex
>> > being held.
>> 
>> That works for the rename as unlink case but we don't hold
>> old_dentry->d_inode->i_mutex which is what is needed to prevent a mount
>> on the dentry we are renaming.
>
> It will be held in 3.13.

Only for files, not for directories.  And none of those locks turns out
to be good enough today to prevent the races between mount and rename.
With the result that when mount returns your mount point could be
located just about anywhere, and that is just considering renames of the
actual mountpoint itself.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-11-21 20:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <8761v7h2pt.fsf@tw-ebiederman.twitter.com>
     [not found] ` <CAJfpegtT7y-1HhbEVAMKkdQugTG_w7G_epGtQHGvQLpcZB5FVA@mail.gmail.com>
     [not found]   ` <87li281wx6.fsf_-_@xmission.com>
     [not found]     ` <87a9ioo37a.fsf_-_@xmission.com>
     [not found]       ` <20131007043919.GB10284@mail.hallyn.com>
     [not found]         ` <87vc191sf2.fsf@xmission.com>
     [not found]           ` <CALCETrUaubASZyeoNZcYxn5-GJO68=Ng=GYmas7K_OBsjM7f0Q@mail.gmail.com>
     [not found]             ` <87d2ngyb02.fsf@xmission.com>
     [not found]               ` <20131008160601.GJ14242@tucsk.piliscsaba.szeredi.hu>
     [not found]                 ` <CALCETrWw=htmgxqq=RQztdJQ-a8c3860vmJh9Ni=Hhs0TkO3YA@mail.gmail.com>
     [not found]                   ` <20131008161135.GK14242@tucsk.piliscsaba.szeredi.hu>
     [not found]                     ` <87li23trll.fsf@tw-ebiederman.twitter.com>
     [not found]                       ` <CAJfpegv+h7xh_2e-X7is9dq1Fp06A0eKwsyWFMPX=azbbDCX5Q@mail.gmail.com>
     [not found]                         ` <87vc15mjuw.fsf@xmission.com>
     [not found]                           ` <CAJfpegvF89LzAvNB0h0otv7sKoS3rewZzQKAauQx3P+rCkCcSg@mail.gmail.com>
     [not found]                             ` <87iox38fkv.fsf@xmission.com>
     [not found]                               ` <87d2nb8dxy.fsf@xmission.com>
     [not found]                                 ` <87d2nb8dxy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-10-15 20:15                                   ` [REVIEW][PATCH 0/4] vfs: Detach mounts on unlink Eric W. Biederman
     [not found]                                     ` <87iowyxpci.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-10-15 20:16                                       ` [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace Eric W. Biederman
2013-10-15 20:17                                       ` [REVIEW][PATCH 2/4] vfs: Keep a list of mounts on a mount point Eric W. Biederman
     [not found]                                         ` <877gdexp9s.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-10-22 19:06                                           ` Serge E. Hallyn
2013-10-15 20:17                                       ` [REVIEW][PATCH 3/4] vfs: Add a function to lazily unmount all mounts from any dentry. v3 Eric W. Biederman
     [not found]                                         ` <871u3mxp8s.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-10-22 19:08                                           ` Serge E. Hallyn
2013-10-15 20:18                                       ` [REVIEW][PATCH 4/4] vfs: Lazily remove mounts on unlinked files and directories. v2 Eric W. Biederman
     [not found]                                         ` <87vc0ywan7.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-10-22 19:13                                           ` Serge E. Hallyn
     [not found]                                     ` <87d2n6xpan.fsf_-_@xmission.com>
     [not found]                                       ` <87d2n6xpan.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-10-22 19:04                                         ` [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace Serge E. Hallyn
2013-11-03  3:54                                         ` Al Viro
     [not found]                                       ` <20131103035406.GA8537@ZenIV.linux.org.uk>
     [not found]                                         ` <20131103035406.GA8537-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2013-11-08 20:51                                           ` Eric W. Biederman
     [not found]                                             ` <87bo1u8vmf.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-11-08 21:35                                               ` Al Viro
     [not found]                                             ` <20131108213551.GR13318@ZenIV.linux.org.uk>
     [not found]                                               ` <20131108213551.GR13318-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2013-11-08 22:17                                                 ` Eric W. Biederman
     [not found]                                                   ` <87fvr61qtg.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-11-09  8:49                                                     ` Christoph Hellwig
     [not found]                                                       ` <20131109084916.GA21413-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2013-11-21 20:58                                                         ` Eric W. Biederman
2013-11-21 20:49                                           ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox