[patch 00/11] mount ownership and unprivileged mount syscall (v9)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 00/11] mount ownership and unprivileged mount syscall (v9)
@ 2008-03-17 20:00 Miklos Szeredi
  2008-03-17 20:00 ` [patch 01/11] unprivileged mounts: add user mounts to the kernel Miklos Szeredi
                   ` (11 more replies)
  0 siblings, 12 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:00 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

Andrew, Al,

Please consider adding this series to your trees.

I've been using these patches for a while on my laptop to mount fuse
filesystems as user, without any suid-root helpers.  The setup is as
follows:

- link /proc/mounts to /etc/mtab
- patch util-linux-ng with http://lkml.org/lkml/2008/1/16/103
- remove suid from mount, umount and fusermount
- add a line to /etc/fstab to bind mount ~/mnt onto itself owned by the user
- add 'fs.types.fuse.usermount_safe = 1' to /etc/sysctl.conf

Apart from '/dev/sda2' being replaced with '/dev/root' in 'mount' and
'df' outputs, I haven't experienced any problems.

Thanks,
Miklos

v8 -> v9

 - new patch: copy mount ownership when cloning the mount namespace

v7 -> v8

 - extend documentation of allow_usermount sysctl tunable
 - describe new unprivileged mounting in fuse.txt

v6 -> v7:

 - add '/proc/sys/fs/types/<type>/usermount_safe' tunable (new patch)
 - do not make FUSE safe by default, describe possible problems
   associated with unprivileged FUSE mounts in patch header
 - return EMFILE instead of EPERM, if maximum user mount count is exceeded
 - rename option 'nomnt' -> 'nosubmnt'
 - clean up error propagation in dup_mnt_ns
 - update util-linux-ng patch

v5 -> v6:

 - update to latest -mm
 - preliminary util-linux-ng support (will post right after this series)

v4 -> v5:

 - fold back Andrew's changes
 - fold back my update patch:
    o use fsuid instead of ruid
    o allow forced unpriv. unmounts for "safe" filesystems
    o allow mounting over special files, but not over symlinks
    o set nosuid and nodev based on lack of specific capability
 - patch header updates
 - new patch: on propagation inherit owner from parent
 - new patch: add "no submounts" mount flag

v3 -> v4:

 - simplify interface as much as possible, now only a single option
   ("user=UID") is used to control everything
 - no longer allow/deny mounting based on file/directory permissions,
   that approach does not always make sense

v1 -> v3:

 - add mount flags to set/clear mnt_flags individually
 - add "usermnt" mount flag.  If it is set, then allow unprivileged
   submounts under this mount
 - make max number of user mounts default to 1024, since now the
   usermnt flag will prevent user mounts by default

--


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 01/11] unprivileged mounts: add user mounts to the kernel
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
@ 2008-03-17 20:00 ` Miklos Szeredi
  2008-03-17 20:00 ` [patch 02/11] unprivileged mounts: allow unprivileged umount Miklos Szeredi
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:00 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-add-user-mounts-to-the-kernel.patch --]
[-- Type: text/plain, Size: 8887 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

This patchset adds support for keeping mount ownership information in the
kernel, and allow unprivileged mount(2) and umount(2) in certain cases.

The mount owner has the following privileges:

  - unmount the owned mount
  - create a submount under the owned mount

The sysadmin can set the owner explicitly on mount and remount.  When an
unprivileged user creates a mount, then the owner is automatically set to the
user.

The following use cases are envisioned:

1) Private namespace, with selected mounts owned by user.  E.g.
   /home/$USER is a good candidate for allowing unpriv mounts and unmounts
   within.

2) Private namespace, with all mounts owned by user and having the "nosuid"
   flag.  User can mount and umount anywhere within the namespace, but suid
   programs will not work.

3) Global namespace, with a designated directory, which is a mount owned by
   the user.  E.g.  /mnt/users/$USER is set up so that it is bind mounted onto
   itself, and set to be owned by $USER.  The user can add/remove mounts only
   under this directory.

The following extra security measures are taken for unprivileged mounts:

 - usermounts are limited by a sysctl tunable
 - force "nosuid,nodev" mount options on the created mount

This series increases the size of vmlinux by about 1.5k on x86_64.

For testing unprivileged mounts (and for other purposes) simple
mount/umount utilities are available from:

  http://www.kernel.org/pub/linux/kernel/people/mszeredi/mmount/

A preliminary patch for util-linux-ng to add the same functionality to
mount(8) and umount(8) is available here:

  http://lkml.org/lkml/2008/1/16/103


This patch:

A new mount flag, MS_SETUSER is used to make a mount owned by a user.  If this
flag is specified, then the owner will be set to the current fsuid and the
mount will be marked with the MNT_USER flag.  On remount don't preserve
previous owner, and treat MS_SETUSER as for a new mount.  The MS_SETUSER flag
is ignored on mount move.

The MNT_USER flag is not copied on any kind of mount cloning: namespace
creation, binding or propagation.  For bind mounts the cloned mount(s) are set
to MNT_USER depending on the MS_SETUSER mount flag.  In all the other cases
MNT_USER is always cleared.

For MNT_USER mounts a "user=UID" option is added to /proc/PID/mounts.  This is
compatible with how mount ownership is stored in /etc/mtab.

The rationale for using MS_SETUSER and MNT_USER, to distinguish "user"
mounts from "non-user" or "legacy" mounts are follows:

  a) Mount(2) and umount(2) on legacy mounts always need CAP_SYS_ADMIN
     capability.  As opposed to user mounts, which will only require,
     that the mount owner matches the current fsuid.  So a process
     with fsuid=0 should not be able to mount/umount legacy mounts
     without the CAP_SYS_ADMIN capability.

  b) Legacy userspace programs may set fsuid to nonzero before calling
     mount(2).  In such an unlikely case, this patchset would cause
     an unintended side effect of making the mount owned by the fsuid.

  c) For legacy mounts, no "user=UID" option should be shown in
     /proc/mounts for backwards compatibility.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 fs/namespace.c        |   39 +++++++++++++++++++++++++++++++--------
 fs/pnode.h            |    1 +
 include/linux/fs.h    |    1 +
 include/linux/mount.h |    3 +++
 4 files changed, 36 insertions(+), 8 deletions(-)

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:31.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:42.000000000 +0100
@@ -521,6 +521,13 @@ static struct vfsmount *skip_mnt_tree(st
 	return p;
 }
 
+static void set_mnt_user(struct vfsmount *mnt)
+{
+	WARN_ON(mnt->mnt_flags & MNT_USER);
+	mnt->mnt_uid = current->fsuid;
+	mnt->mnt_flags |= MNT_USER;
+}
+
 static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 					int flag)
 {
@@ -535,6 +542,11 @@ static struct vfsmount *clone_mnt(struct
 		mnt->mnt_mountpoint = mnt->mnt_root;
 		mnt->mnt_parent = mnt;
 
+		/* don't copy the MNT_USER flag */
+		mnt->mnt_flags &= ~MNT_USER;
+		if (flag & CL_SETUSER)
+			set_mnt_user(mnt);
+
 		if (flag & CL_SLAVE) {
 			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
 			mnt->mnt_master = old;
@@ -739,6 +751,8 @@ static void show_mnt_opts(struct seq_fil
 		if (mnt->mnt_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
 	}
+	if (mnt->mnt_flags & MNT_USER)
+		seq_printf(m, ",user=%i", mnt->mnt_uid);
 }
 
 static void show_type(struct seq_file *m, struct super_block *sb)
@@ -1364,8 +1378,9 @@ static noinline int do_change_type(struc
  * noinline this do_mount helper to save do_mount stack space.
  */
 static noinline int do_loopback(struct nameidata *nd, char *old_name,
-				int recurse)
+				int flags)
 {
+	int clone_fl;
 	struct nameidata old_nd;
 	struct vfsmount *mnt = NULL;
 	int err = mount_is_safe(nd);
@@ -1385,11 +1400,12 @@ static noinline int do_loopback(struct n
 	if (!check_mnt(nd->path.mnt) || !check_mnt(old_nd.path.mnt))
 		goto out;
 
+	clone_fl = (flags & MS_SETUSER) ? CL_SETUSER : 0;
 	err = -ENOMEM;
-	if (recurse)
-		mnt = copy_tree(old_nd.path.mnt, old_nd.path.dentry, 0);
+	if (flags & MS_REC)
+		mnt = copy_tree(old_nd.path.mnt, old_nd.path.dentry, clone_fl);
 	else
-		mnt = clone_mnt(old_nd.path.mnt, old_nd.path.dentry, 0);
+		mnt = clone_mnt(old_nd.path.mnt, old_nd.path.dentry, clone_fl);
 
 	if (!mnt)
 		goto out;
@@ -1452,8 +1468,11 @@ static noinline int do_remount(struct na
 		err = change_mount_flags(nd->path.mnt, flags);
 	else
 		err = do_remount_sb(sb, flags, data, 0);
-	if (!err)
+	if (!err) {
 		nd->path.mnt->mnt_flags = mnt_flags;
+		if (flags & MS_SETUSER)
+			set_mnt_user(nd->path.mnt);
+	}
 	up_write(&sb->s_umount);
 	if (!err)
 		security_sb_post_remount(nd->path.mnt, flags, data);
@@ -1566,10 +1585,13 @@ static noinline int do_new_mount(struct 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	mnt = do_kern_mount(type, flags, name, data);
+	mnt = do_kern_mount(type, flags & ~MS_SETUSER, name, data);
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
+	if (flags & MS_SETUSER)
+		set_mnt_user(mnt);
+
 	return do_add_mount(mnt, nd, mnt_flags, NULL);
 }
 
@@ -1601,7 +1623,8 @@ int do_add_mount(struct vfsmount *newmnt
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
-	newmnt->mnt_flags = mnt_flags;
+	/* MNT_USER was set earlier */
+	newmnt->mnt_flags |= mnt_flags;
 	if ((err = graft_tree(newmnt, nd)))
 		goto unlock;
 
@@ -1923,7 +1946,7 @@ long do_mount(char *dev_name, char *dir_
 		retval = do_remount(&nd, flags & ~MS_REMOUNT, mnt_flags,
 				    data_page);
 	else if (flags & MS_BIND)
-		retval = do_loopback(&nd, dev_name, flags & MS_REC);
+		retval = do_loopback(&nd, dev_name, flags);
 	else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
 		retval = do_change_type(&nd, flags);
 	else if (flags & MS_MOVE)
Index: linux/fs/pnode.h
===================================================================
--- linux.orig/fs/pnode.h	2008-03-17 20:55:31.000000000 +0100
+++ linux/fs/pnode.h	2008-03-17 20:55:42.000000000 +0100
@@ -22,6 +22,7 @@
 #define CL_MAKE_SHARED 		0x08
 #define CL_PROPAGATION 		0x10
 #define CL_PRIVATE 		0x20
+#define CL_SETUSER		0x40
 
 void set_mnt_shared(struct vfsmount *);
 void clear_mnt_shared(struct vfsmount *);
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2008-03-17 20:55:31.000000000 +0100
+++ linux/include/linux/fs.h	2008-03-17 20:55:42.000000000 +0100
@@ -125,6 +125,7 @@ extern int dir_notify_enable;
 #define MS_RELATIME	(1<<21)	/* Update atime relative to mtime/ctime. */
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
+#define MS_SETUSER	(1<<24) /* set mnt_uid to current user */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
 
Index: linux/include/linux/mount.h
===================================================================
--- linux.orig/include/linux/mount.h	2008-03-17 20:55:31.000000000 +0100
+++ linux/include/linux/mount.h	2008-03-17 20:55:42.000000000 +0100
@@ -33,6 +33,7 @@ struct mnt_namespace;
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_IMBALANCED_WRITE_COUNT	0x200 /* just for debugging */
+#define MNT_USER	0x400
 
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */
@@ -71,6 +72,8 @@ struct vfsmount {
 	 * are held, and all mnt_writer[]s on this mount have 0 as their ->count
 	 */
 	atomic_t __mnt_writers;
+
+	uid_t mnt_uid;			/* owner of the mount */
 };
 
 static inline struct vfsmount *mntget(struct vfsmount *mnt)

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 02/11] unprivileged mounts: allow unprivileged umount
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
  2008-03-17 20:00 ` [patch 01/11] unprivileged mounts: add user mounts to the kernel Miklos Szeredi
@ 2008-03-17 20:00 ` Miklos Szeredi
  2008-03-17 20:00 ` [patch 03/11] unprivileged mounts: propagate error values from clone_mnt Miklos Szeredi
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:00 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-allow-unprivileged-umount.patch --]
[-- Type: text/plain, Size: 1619 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

The owner doesn't need sysadmin capabilities to call umount().

Similar behavior as umount(8) on mounts having "user=UID" option in /etc/mtab.
The difference is that umount also checks /etc/fstab, presumably to exclude
another mount on the same mountpoint.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 fs/namespace.c |   23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:42.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:44.000000000 +0100
@@ -1074,6 +1074,27 @@ static int do_umount(struct vfsmount *mn
 	return retval;
 }
 
+static bool is_mount_owner(struct vfsmount *mnt, uid_t uid)
+{
+	return (mnt->mnt_flags & MNT_USER) && mnt->mnt_uid == uid;
+}
+
+/*
+ * umount is permitted for
+ *  - sysadmin
+ *  - mount owner, if not forced umount
+ */
+static bool permit_umount(struct vfsmount *mnt, int flags)
+{
+	if (capable(CAP_SYS_ADMIN))
+		return true;
+
+	if (flags & MNT_FORCE)
+		return false;
+
+	return is_mount_owner(mnt, current->fsuid);
+}
+
 /*
  * Now umount can handle mount points as well as block devices.
  * This is important for filesystems which use unnamed block devices.
@@ -1097,7 +1118,7 @@ asmlinkage long sys_umount(char __user *
 		goto dput_and_out;
 
 	retval = -EPERM;
-	if (!capable(CAP_SYS_ADMIN))
+	if (!permit_umount(nd.path.mnt, flags))
 		goto dput_and_out;
 
 	retval = do_umount(nd.path.mnt, flags);

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 03/11] unprivileged mounts: propagate error values from clone_mnt
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
  2008-03-17 20:00 ` [patch 01/11] unprivileged mounts: add user mounts to the kernel Miklos Szeredi
  2008-03-17 20:00 ` [patch 02/11] unprivileged mounts: allow unprivileged umount Miklos Szeredi
@ 2008-03-17 20:00 ` Miklos Szeredi
  2008-03-17 20:00 ` [patch 04/11] unprivileged mounts: account user mounts Miklos Szeredi
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:00 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-propagate-error-values-from-clone_mnt.patch --]
[-- Type: text/plain, Size: 5576 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Allow clone_mnt() to return errors other than ENOMEM.  This will be used for
returning a different error value when the number of user mounts goes over the
limit.

Fix copy_tree() to return EPERM for unbindable mounts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 fs/namespace.c |   95 ++++++++++++++++++++++++++++-----------------------------
 fs/pnode.c     |    5 +--
 2 files changed, 51 insertions(+), 49 deletions(-)

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:44.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:45.000000000 +0100
@@ -534,43 +534,44 @@ static struct vfsmount *clone_mnt(struct
 	struct super_block *sb = old->mnt_sb;
 	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
 
-	if (mnt) {
-		mnt->mnt_flags = old->mnt_flags;
-		atomic_inc(&sb->s_active);
-		mnt->mnt_sb = sb;
-		mnt->mnt_root = dget(root);
-		mnt->mnt_mountpoint = mnt->mnt_root;
-		mnt->mnt_parent = mnt;
-
-		/* don't copy the MNT_USER flag */
-		mnt->mnt_flags &= ~MNT_USER;
-		if (flag & CL_SETUSER)
-			set_mnt_user(mnt);
-
-		if (flag & CL_SLAVE) {
-			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
-			mnt->mnt_master = old;
-			clear_mnt_shared(mnt);
-		} else if (!(flag & CL_PRIVATE)) {
-			if (flag & CL_PROPAGATION)
-				set_mnt_shared(old);
-			if (IS_MNT_SHARED(old))
-				make_mnt_peer(old, mnt);
-			if (IS_MNT_SLAVE(old))
-				list_add(&mnt->mnt_slave, &old->mnt_slave);
-			mnt->mnt_master = old->mnt_master;
-		}
-		if (flag & CL_MAKE_SHARED)
-			set_mnt_shared(mnt);
+	if (!mnt)
+		return ERR_PTR(-ENOMEM);
 
-		/* stick the duplicate mount on the same expiry list
-		 * as the original if that was on one */
-		if (flag & CL_EXPIRE) {
-			spin_lock(&vfsmount_lock);
-			if (!list_empty(&old->mnt_expire))
-				list_add(&mnt->mnt_expire, &old->mnt_expire);
-			spin_unlock(&vfsmount_lock);
-		}
+	mnt->mnt_flags = old->mnt_flags;
+	atomic_inc(&sb->s_active);
+	mnt->mnt_sb = sb;
+	mnt->mnt_root = dget(root);
+	mnt->mnt_mountpoint = mnt->mnt_root;
+	mnt->mnt_parent = mnt;
+
+	/* don't copy the MNT_USER flag */
+	mnt->mnt_flags &= ~MNT_USER;
+	if (flag & CL_SETUSER)
+		set_mnt_user(mnt);
+
+	if (flag & CL_SLAVE) {
+		list_add(&mnt->mnt_slave, &old->mnt_slave_list);
+		mnt->mnt_master = old;
+		clear_mnt_shared(mnt);
+	} else if (!(flag & CL_PRIVATE)) {
+		if (flag & CL_PROPAGATION)
+			set_mnt_shared(old);
+		if (IS_MNT_SHARED(old))
+			make_mnt_peer(old, mnt);
+		if (IS_MNT_SLAVE(old))
+			list_add(&mnt->mnt_slave, &old->mnt_slave);
+		mnt->mnt_master = old->mnt_master;
+	}
+	if (flag & CL_MAKE_SHARED)
+		set_mnt_shared(mnt);
+
+	/* stick the duplicate mount on the same expiry list
+	 * as the original if that was on one */
+	if (flag & CL_EXPIRE) {
+		spin_lock(&vfsmount_lock);
+		if (!list_empty(&old->mnt_expire))
+			list_add(&mnt->mnt_expire, &old->mnt_expire);
+		spin_unlock(&vfsmount_lock);
 	}
 	return mnt;
 }
@@ -1178,11 +1179,11 @@ struct vfsmount *copy_tree(struct vfsmou
 	struct nameidata nd;
 
 	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
-		return NULL;
+		return ERR_PTR(-EPERM);
 
 	res = q = clone_mnt(mnt, dentry, flag);
-	if (!q)
-		goto Enomem;
+	if (IS_ERR(q))
+		goto error;
 	q->mnt_mountpoint = mnt->mnt_mountpoint;
 
 	p = mnt;
@@ -1203,8 +1204,8 @@ struct vfsmount *copy_tree(struct vfsmou
 			nd.path.mnt = q;
 			nd.path.dentry = p->mnt_mountpoint;
 			q = clone_mnt(p, p->mnt_root, flag);
-			if (!q)
-				goto Enomem;
+			if (IS_ERR(q))
+				goto error;
 			spin_lock(&vfsmount_lock);
 			list_add_tail(&q->mnt_list, &res->mnt_list);
 			attach_mnt(q, &nd);
@@ -1212,7 +1213,7 @@ struct vfsmount *copy_tree(struct vfsmou
 		}
 	}
 	return res;
-Enomem:
+ error:
 	if (res) {
 		LIST_HEAD(umount_list);
 		spin_lock(&vfsmount_lock);
@@ -1220,7 +1221,7 @@ Enomem:
 		spin_unlock(&vfsmount_lock);
 		release_mounts(&umount_list);
 	}
-	return NULL;
+	return q;
 }
 
 struct vfsmount *collect_mounts(struct vfsmount *mnt, struct dentry *dentry)
@@ -1422,13 +1423,13 @@ static noinline int do_loopback(struct n
 		goto out;
 
 	clone_fl = (flags & MS_SETUSER) ? CL_SETUSER : 0;
-	err = -ENOMEM;
 	if (flags & MS_REC)
 		mnt = copy_tree(old_nd.path.mnt, old_nd.path.dentry, clone_fl);
 	else
 		mnt = clone_mnt(old_nd.path.mnt, old_nd.path.dentry, clone_fl);
 
-	if (!mnt)
+	err = PTR_ERR(mnt);
+	if (IS_ERR(mnt))
 		goto out;
 
 	err = graft_tree(mnt, nd);
@@ -2004,10 +2005,10 @@ static struct mnt_namespace *dup_mnt_ns(
 	/* First pass: copy the tree topology */
 	new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root,
 					CL_COPY_ALL | CL_EXPIRE);
-	if (!new_ns->root) {
+	if (IS_ERR(new_ns->root)) {
 		up_write(&namespace_sem);
 		kfree(new_ns);
-		return ERR_PTR(-ENOMEM);;
+		return ERR_CAST(new_ns->root);
 	}
 	spin_lock(&vfsmount_lock);
 	list_add_tail(&new_ns->list, &new_ns->root->mnt_list);
Index: linux/fs/pnode.c
===================================================================
--- linux.orig/fs/pnode.c	2008-03-17 20:55:31.000000000 +0100
+++ linux/fs/pnode.c	2008-03-17 20:55:45.000000000 +0100
@@ -306,8 +306,9 @@ int propagate_mnt(struct vfsmount *dest_
 
 		source =  get_source(m, prev_dest_mnt, prev_src_mnt, &type);
 
-		if (!(child = copy_tree(source, source->mnt_root, type))) {
-			ret = -ENOMEM;
+		child = copy_tree(source, source->mnt_root, type);
+		if (IS_ERR(child)) {
+			ret = PTR_ERR(child);
 			list_splice(tree_list, tmp_list.prev);
 			goto out;
 		}

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 04/11] unprivileged mounts: account user mounts
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (2 preceding siblings ...)
  2008-03-17 20:00 ` [patch 03/11] unprivileged mounts: propagate error values from clone_mnt Miklos Szeredi
@ 2008-03-17 20:00 ` Miklos Szeredi
  2008-03-17 20:00 ` [patch 05/11] unprivileged mounts: allow unprivileged bind mounts Miklos Szeredi
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:00 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-account-user-mounts.patch --]
[-- Type: text/plain, Size: 6079 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add sysctl variables for accounting and limiting the number of user
mounts.

The maximum number of user mounts is set to 1024 by default.  This
won't in itself enable user mounts, setting a mount to be owned by a
user is first needed.

[akpm]
 - don't use enumerated sysctls

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 Documentation/filesystems/proc.txt |    9 ++++
 fs/namespace.c                     |   67 ++++++++++++++++++++++++++++++++++---
 include/linux/fs.h                 |    3 +
 kernel/sysctl.c                    |   16 ++++++++
 4 files changed, 91 insertions(+), 4 deletions(-)

Index: linux/Documentation/filesystems/proc.txt
===================================================================
--- linux.orig/Documentation/filesystems/proc.txt	2008-03-17 20:55:31.000000000 +0100
+++ linux/Documentation/filesystems/proc.txt	2008-03-17 20:55:47.000000000 +0100
@@ -1053,6 +1053,15 @@ reaches aio-max-nr then io_setup will fa
 raising aio-max-nr does not result in the pre-allocation or re-sizing
 of any kernel data structures.
 
+nr_user_mounts and max_user_mounts
+----------------------------------
+
+These represent the number of "user" mounts and the maximum number of
+"user" mounts respectively.  User mounts may be created by
+unprivileged users.  User mounts may also be created with sysadmin
+privileges on behalf of a user, in which case nr_user_mounts may
+exceed max_user_mounts.
+
 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
 -----------------------------------------------------------
 
Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:45.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:47.000000000 +0100
@@ -46,6 +46,9 @@ static struct list_head *mount_hashtable
 static struct kmem_cache *mnt_cache __read_mostly;
 static struct rw_semaphore namespace_sem;
 
+int nr_user_mounts;
+int max_user_mounts = 1024;
+
 /* /sys/fs */
 struct kobject *fs_kobj;
 EXPORT_SYMBOL_GPL(fs_kobj);
@@ -521,21 +524,70 @@ static struct vfsmount *skip_mnt_tree(st
 	return p;
 }
 
-static void set_mnt_user(struct vfsmount *mnt)
+static void dec_nr_user_mounts(void)
+{
+	spin_lock(&vfsmount_lock);
+	nr_user_mounts--;
+	spin_unlock(&vfsmount_lock);
+}
+
+static int reserve_user_mount(void)
+{
+	int err = 0;
+
+	spin_lock(&vfsmount_lock);
+	/*
+	 * EMFILE was error returned by mount(2) in the old days, when
+	 * the mount count was limited.  Reuse this error value to
+	 * mean, that the maximum number of user mounts has been
+	 * exceeded.
+	 */
+	if (nr_user_mounts >= max_user_mounts && !capable(CAP_SYS_ADMIN))
+		err = -EMFILE;
+	else
+		nr_user_mounts++;
+	spin_unlock(&vfsmount_lock);
+	return err;
+}
+
+static void __set_mnt_user(struct vfsmount *mnt)
 {
 	WARN_ON(mnt->mnt_flags & MNT_USER);
 	mnt->mnt_uid = current->fsuid;
 	mnt->mnt_flags |= MNT_USER;
 }
 
+static void set_mnt_user(struct vfsmount *mnt)
+{
+	__set_mnt_user(mnt);
+	spin_lock(&vfsmount_lock);
+	nr_user_mounts++;
+	spin_unlock(&vfsmount_lock);
+}
+
+static void clear_mnt_user(struct vfsmount *mnt)
+{
+	if (mnt->mnt_flags & MNT_USER) {
+		mnt->mnt_uid = 0;
+		mnt->mnt_flags &= ~MNT_USER;
+		dec_nr_user_mounts();
+	}
+}
+
 static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 					int flag)
 {
 	struct super_block *sb = old->mnt_sb;
-	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
+	struct vfsmount *mnt;
 
+	if (flag & CL_SETUSER) {
+		int err = reserve_user_mount();
+		if (err)
+			return ERR_PTR(err);
+	}
+	mnt = alloc_vfsmnt(old->mnt_devname);
 	if (!mnt)
-		return ERR_PTR(-ENOMEM);
+		goto alloc_failed;
 
 	mnt->mnt_flags = old->mnt_flags;
 	atomic_inc(&sb->s_active);
@@ -547,7 +599,7 @@ static struct vfsmount *clone_mnt(struct
 	/* don't copy the MNT_USER flag */
 	mnt->mnt_flags &= ~MNT_USER;
 	if (flag & CL_SETUSER)
-		set_mnt_user(mnt);
+		__set_mnt_user(mnt);
 
 	if (flag & CL_SLAVE) {
 		list_add(&mnt->mnt_slave, &old->mnt_slave_list);
@@ -574,6 +626,11 @@ static struct vfsmount *clone_mnt(struct
 		spin_unlock(&vfsmount_lock);
 	}
 	return mnt;
+
+ alloc_failed:
+	if (flag & CL_SETUSER)
+		dec_nr_user_mounts();
+	return ERR_PTR(-ENOMEM);
 }
 
 static inline void __mntput(struct vfsmount *mnt)
@@ -603,6 +660,7 @@ static inline void __mntput(struct vfsmo
 	 */
 	WARN_ON(atomic_read(&mnt->__mnt_writers));
 	dput(mnt->mnt_root);
+	clear_mnt_user(mnt);
 	free_vfsmnt(mnt);
 	deactivate_super(sb);
 }
@@ -1491,6 +1549,7 @@ static noinline int do_remount(struct na
 	else
 		err = do_remount_sb(sb, flags, data, 0);
 	if (!err) {
+		clear_mnt_user(nd->path.mnt);
 		nd->path.mnt->mnt_flags = mnt_flags;
 		if (flags & MS_SETUSER)
 			set_mnt_user(nd->path.mnt);
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2008-03-17 20:55:42.000000000 +0100
+++ linux/include/linux/fs.h	2008-03-17 20:55:47.000000000 +0100
@@ -50,6 +50,9 @@ extern struct inodes_stat_t inodes_stat;
 
 extern int leases_enable, lease_break_time;
 
+extern int nr_user_mounts;
+extern int max_user_mounts;
+
 #ifdef CONFIG_DNOTIFY
 extern int dir_notify_enable;
 #endif
Index: linux/kernel/sysctl.c
===================================================================
--- linux.orig/kernel/sysctl.c	2008-03-17 20:55:31.000000000 +0100
+++ linux/kernel/sysctl.c	2008-03-17 20:55:47.000000000 +0100
@@ -1278,6 +1278,22 @@ static struct ctl_table fs_table[] = {
 #endif	
 #endif
 	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "nr_user_mounts",
+		.data		= &nr_user_mounts,
+		.maxlen		= sizeof(int),
+		.mode		= 0444,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "max_user_mounts",
+		.data		= &max_user_mounts,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
 		.ctl_name	= KERN_SETUID_DUMPABLE,
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 05/11] unprivileged mounts: allow unprivileged bind mounts
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (3 preceding siblings ...)
  2008-03-17 20:00 ` [patch 04/11] unprivileged mounts: account user mounts Miklos Szeredi
@ 2008-03-17 20:00 ` Miklos Szeredi
  2008-03-17 20:00 ` [patch 06/11] unprivileged mounts: allow unprivileged mounts Miklos Szeredi
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:00 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-allow-unprivileged-bind-mounts.patch --]
[-- Type: text/plain, Size: 2660 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Allow bind mounts to unprivileged users if the following conditions are met:

  - mountpoint is not a symlink
  - parent mount is owned by the user
  - the number of user mounts is below the maximum

Unprivileged mounts imply MS_SETUSER, and will also have the "nosuid" and
"nodev" mount flags set.

In particular, if mounting process doesn't have CAP_SETUID capability,
then the "nosuid" flag will be added, and if it doesn't have CAP_MKNOD
capability, then the "nodev" flag will be added.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 fs/namespace.c |   44 +++++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:47.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:48.000000000 +0100
@@ -555,6 +555,11 @@ static void __set_mnt_user(struct vfsmou
 	WARN_ON(mnt->mnt_flags & MNT_USER);
 	mnt->mnt_uid = current->fsuid;
 	mnt->mnt_flags |= MNT_USER;
+
+	if (!capable(CAP_SETUID))
+		mnt->mnt_flags |= MNT_NOSUID;
+	if (!capable(CAP_MKNOD))
+		mnt->mnt_flags |= MNT_NODEV;
 }
 
 static void set_mnt_user(struct vfsmount *mnt)
@@ -1201,22 +1206,26 @@ asmlinkage long sys_oldumount(char __use
 
 #endif
 
-static int mount_is_safe(struct nameidata *nd)
+/*
+ * Conditions for unprivileged mounts are:
+ * - mountpoint is not a symlink
+ * - mountpoint is in a mount owned by the user
+ */
+static bool permit_mount(struct nameidata *nd, int *flags)
 {
+	struct inode *inode = nd->path.dentry->d_inode;
+
 	if (capable(CAP_SYS_ADMIN))
-		return 0;
-	return -EPERM;
-#ifdef notyet
-	if (S_ISLNK(nd->path.dentry->d_inode->i_mode))
-		return -EPERM;
-	if (nd->path.dentry->d_inode->i_mode & S_ISVTX) {
-		if (current->uid != nd->path.dentry->d_inode->i_uid)
-			return -EPERM;
-	}
-	if (vfs_permission(nd, MAY_WRITE))
-		return -EPERM;
-	return 0;
-#endif
+		return true;
+
+	if (S_ISLNK(inode->i_mode))
+		return false;
+
+	if (!is_mount_owner(nd->path.mnt, current->fsuid))
+		return false;
+
+	*flags |= MS_SETUSER;
+	return true;
 }
 
 static int lives_below_in_same_fs(struct dentry *d, struct dentry *dentry)
@@ -1463,9 +1472,10 @@ static noinline int do_loopback(struct n
 	int clone_fl;
 	struct nameidata old_nd;
 	struct vfsmount *mnt = NULL;
-	int err = mount_is_safe(nd);
-	if (err)
-		return err;
+	int err;
+
+	if (!permit_mount(nd, &flags))
+		return -EPERM;
 	if (!old_name || !*old_name)
 		return -EINVAL;
 	err = path_lookup(old_name, LOOKUP_FOLLOW, &old_nd);

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 06/11] unprivileged mounts: allow unprivileged mounts
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (4 preceding siblings ...)
  2008-03-17 20:00 ` [patch 05/11] unprivileged mounts: allow unprivileged bind mounts Miklos Szeredi
@ 2008-03-17 20:00 ` Miklos Szeredi
  2008-03-17 20:01 ` [patch 07/11] unprivileged mounts: add sysctl tunable for "safe" property Miklos Szeredi
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:00 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-allow-unprivileged-mounts.patch --]
[-- Type: text/plain, Size: 6481 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

For "safe" filesystems allow unprivileged mounting and forced
unmounting.

A filesystem type is considered "safe", if mounting it by an
unprivileged user may not cause a security problem.  This is somewhat
subjective, so setting this property is left to userspace (implemented
in the next patch).

Since most filesystems haven't been designed with unprivileged
mounting in mind, a thorough audit is recommended before setting this
property.

Make this a separate integer member in 'struct file_system_type'
instead of a flag, since that is easier to handle by sysctl code.

Move subtype handling from do_kern_mount() into do_new_mount().  All
other callers are kernel-internal and do not need subtype support.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 fs/namespace.c     |   80 +++++++++++++++++++++++++++++++++++++++++++----------
 fs/super.c         |   26 -----------------
 include/linux/fs.h |    1 
 3 files changed, 67 insertions(+), 40 deletions(-)

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:48.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:49.000000000 +0100
@@ -1146,14 +1146,16 @@ static bool is_mount_owner(struct vfsmou
 /*
  * umount is permitted for
  *  - sysadmin
- *  - mount owner, if not forced umount
+ *  - mount owner
+ *    o if not forced umount,
+ *    o if forced umount, and filesystem is "safe"
  */
 static bool permit_umount(struct vfsmount *mnt, int flags)
 {
 	if (capable(CAP_SYS_ADMIN))
 		return true;
 
-	if (flags & MNT_FORCE)
+	if ((flags & MNT_FORCE) && !(mnt->mnt_sb->s_type->fs_safe))
 		return false;
 
 	return is_mount_owner(mnt, current->fsuid);
@@ -1211,13 +1213,17 @@ asmlinkage long sys_oldumount(char __use
  * - mountpoint is not a symlink
  * - mountpoint is in a mount owned by the user
  */
-static bool permit_mount(struct nameidata *nd, int *flags)
+static bool permit_mount(struct nameidata *nd, struct file_system_type *type,
+			 int *flags)
 {
 	struct inode *inode = nd->path.dentry->d_inode;
 
 	if (capable(CAP_SYS_ADMIN))
 		return true;
 
+	if (type && !type->fs_safe)
+		return false;
+
 	if (S_ISLNK(inode->i_mode))
 		return false;
 
@@ -1474,7 +1480,7 @@ static noinline int do_loopback(struct n
 	struct vfsmount *mnt = NULL;
 	int err;
 
-	if (!permit_mount(nd, &flags))
+	if (!permit_mount(nd, NULL, &flags))
 		return -EPERM;
 	if (!old_name || !*old_name)
 		return -EINVAL;
@@ -1659,31 +1665,77 @@ out:
 	return err;
 }
 
+static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)
+{
+	int err;
+	const char *subtype = strchr(fstype, '.');
+	if (subtype) {
+		subtype++;
+		err = -EINVAL;
+		if (!subtype[0])
+			goto err;
+	} else
+		subtype = "";
+
+	mnt->mnt_sb->s_subtype = kstrdup(subtype, GFP_KERNEL);
+	err = -ENOMEM;
+	if (!mnt->mnt_sb->s_subtype)
+		goto err;
+	return mnt;
+
+ err:
+	mntput(mnt);
+	return ERR_PTR(err);
+}
+
 /*
  * create a new mount for userspace and request it to be added into the
  * namespace's tree
  * noinline this do_mount helper to save do_mount stack space.
  */
-static noinline int do_new_mount(struct nameidata *nd, char *type, int flags,
+static noinline int do_new_mount(struct nameidata *nd, char *fstype, int flags,
 			int mnt_flags, char *name, void *data)
 {
+	int err;
 	struct vfsmount *mnt;
+	struct file_system_type *type;
 
-	if (!type || !memchr(type, 0, PAGE_SIZE))
+	if (!fstype || !memchr(fstype, 0, PAGE_SIZE))
 		return -EINVAL;
 
-	/* we need capabilities... */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	mnt = do_kern_mount(type, flags & ~MS_SETUSER, name, data);
-	if (IS_ERR(mnt))
+	type = get_fs_type(fstype);
+	if (!type)
+		return -ENODEV;
+
+	err = -EPERM;
+	if (!permit_mount(nd, type, &flags))
+		goto out_put_filesystem;
+
+	if (flags & MS_SETUSER) {
+		err = reserve_user_mount();
+		if (err)
+			goto out_put_filesystem;
+	}
+
+	mnt = vfs_kern_mount(type, flags & ~MS_SETUSER, name, data);
+	if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE) &&
+	    !mnt->mnt_sb->s_subtype)
+		mnt = fs_set_subtype(mnt, fstype);
+	put_filesystem(type);
+	if (IS_ERR(mnt)) {
+		if (flags & MS_SETUSER)
+			dec_nr_user_mounts();
 		return PTR_ERR(mnt);
+	}
 
 	if (flags & MS_SETUSER)
-		set_mnt_user(mnt);
+		__set_mnt_user(mnt);
 
 	return do_add_mount(mnt, nd, mnt_flags, NULL);
+
+ out_put_filesystem:
+	put_filesystem(type);
+	return err;
 }
 
 /*
@@ -1714,7 +1766,7 @@ int do_add_mount(struct vfsmount *newmnt
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
-	/* MNT_USER was set earlier */
+	/* some flags may have been set earlier */
 	newmnt->mnt_flags |= mnt_flags;
 	if ((err = graft_tree(newmnt, nd)))
 		goto unlock;
Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c	2008-03-17 20:55:31.000000000 +0100
+++ linux/fs/super.c	2008-03-17 20:55:49.000000000 +0100
@@ -927,29 +927,6 @@ out:
 
 EXPORT_SYMBOL_GPL(vfs_kern_mount);
 
-static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)
-{
-	int err;
-	const char *subtype = strchr(fstype, '.');
-	if (subtype) {
-		subtype++;
-		err = -EINVAL;
-		if (!subtype[0])
-			goto err;
-	} else
-		subtype = "";
-
-	mnt->mnt_sb->s_subtype = kstrdup(subtype, GFP_KERNEL);
-	err = -ENOMEM;
-	if (!mnt->mnt_sb->s_subtype)
-		goto err;
-	return mnt;
-
- err:
-	mntput(mnt);
-	return ERR_PTR(err);
-}
-
 struct vfsmount *
 do_kern_mount(const char *fstype, int flags, const char *name, void *data)
 {
@@ -958,9 +935,6 @@ do_kern_mount(const char *fstype, int fl
 	if (!type)
 		return ERR_PTR(-ENODEV);
 	mnt = vfs_kern_mount(type, flags, name, data);
-	if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE) &&
-	    !mnt->mnt_sb->s_subtype)
-		mnt = fs_set_subtype(mnt, fstype);
 	put_filesystem(type);
 	return mnt;
 }
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2008-03-17 20:55:47.000000000 +0100
+++ linux/include/linux/fs.h	2008-03-17 20:55:49.000000000 +0100
@@ -1484,6 +1484,7 @@ int sync_inode(struct inode *inode, stru
 struct file_system_type {
 	const char *name;
 	int fs_flags;
+	int fs_safe;
 	int (*get_sb) (struct file_system_type *, int,
 		       const char *, void *, struct vfsmount *);
 	void (*kill_sb) (struct super_block *);

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 07/11] unprivileged mounts: add sysctl tunable for "safe" property
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (5 preceding siblings ...)
  2008-03-17 20:00 ` [patch 06/11] unprivileged mounts: allow unprivileged mounts Miklos Szeredi
@ 2008-03-17 20:01 ` Miklos Szeredi
  2008-03-17 20:01 ` [patch 08/11] unprivileged mounts: make fuse safe Miklos Szeredi
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:01 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-add-sysctl-tunable-for-safe-property.patch --]
[-- Type: text/plain, Size: 5272 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add the following:

  /proc/sys/fs/types/${FS_TYPE}/usermount_safe

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 Documentation/filesystems/proc.txt |   31 +++++++++++++++++++
 fs/filesystems.c                   |   60 +++++++++++++++++++++++++++++++++++++
 include/linux/fs.h                 |    1 
 3 files changed, 92 insertions(+)

Index: linux/fs/filesystems.c
===================================================================
--- linux.orig/fs/filesystems.c	2008-03-17 20:55:30.000000000 +0100
+++ linux/fs/filesystems.c	2008-03-17 20:55:50.000000000 +0100
@@ -12,6 +12,7 @@
 #include <linux/kmod.h>
 #include <linux/init.h>
 #include <linux/module.h>
+#include <linux/sysctl.h>
 #include <asm/uaccess.h>
 
 /*
@@ -51,6 +52,57 @@ static struct file_system_type **find_fi
 	return p;
 }
 
+#define MAX_FILESYSTEM_VARS 1
+
+struct filesystem_sysctl_table {
+	struct ctl_table_header *header;
+	struct ctl_table table[MAX_FILESYSTEM_VARS + 1];
+};
+
+/*
+ * Create /sys/fs/types/${FSNAME} directory with per fs-type tunables.
+ */
+static int filesystem_sysctl_register(struct file_system_type *fs)
+{
+	struct filesystem_sysctl_table *t;
+	struct ctl_path path[] = {
+		{ .procname = "fs", .ctl_name = CTL_FS },
+		{ .procname = "types", .ctl_name = CTL_UNNUMBERED },
+		{ .procname = fs->name, .ctl_name = CTL_UNNUMBERED },
+		{ }
+	};
+
+	t = kzalloc(sizeof(*t), GFP_KERNEL);
+	if (!t)
+		return -ENOMEM;
+
+
+	t->table[0].ctl_name = CTL_UNNUMBERED;
+	t->table[0].procname = "usermount_safe";
+	t->table[0].maxlen = sizeof(int);
+	t->table[0].data = &fs->fs_safe;
+	t->table[0].mode = 0644;
+	t->table[0].proc_handler = &proc_dointvec;
+
+	t->header = register_sysctl_paths(path, t->table);
+	if (!t->header) {
+		kfree(t);
+		return -ENOMEM;
+	}
+
+	fs->sysctl_table = t;
+
+	return 0;
+}
+
+static void filesystem_sysctl_unregister(struct file_system_type *fs)
+{
+	struct filesystem_sysctl_table *t = fs->sysctl_table;
+
+	unregister_sysctl_table(t->header);
+	kfree(t);
+}
+
 /**
  *	register_filesystem - register a new filesystem
  *	@fs: the file system structure
@@ -80,6 +132,13 @@ int register_filesystem(struct file_syst
 	else
 		*p = fs;
 	write_unlock(&file_systems_lock);
+
+	if (res == 0) {
+		res = filesystem_sysctl_register(fs);
+		if (res != 0)
+			unregister_filesystem(fs);
+	}
+
 	return res;
 }
 
@@ -108,6 +167,7 @@ int unregister_filesystem(struct file_sy
 			*tmp = fs->next;
 			fs->next = NULL;
 			write_unlock(&file_systems_lock);
+			filesystem_sysctl_unregister(fs);
 			return 0;
 		}
 		tmp = &(*tmp)->next;
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2008-03-17 20:55:49.000000000 +0100
+++ linux/include/linux/fs.h	2008-03-17 20:55:50.000000000 +0100
@@ -1491,6 +1491,7 @@ struct file_system_type {
 	struct module *owner;
 	struct file_system_type * next;
 	struct list_head fs_supers;
+	struct filesystem_sysctl_table *sysctl_table;
 
 	struct lock_class_key s_lock_key;
 	struct lock_class_key s_umount_key;
Index: linux/Documentation/filesystems/proc.txt
===================================================================
--- linux.orig/Documentation/filesystems/proc.txt	2008-03-17 20:55:47.000000000 +0100
+++ linux/Documentation/filesystems/proc.txt	2008-03-17 20:55:50.000000000 +0100
@@ -44,6 +44,7 @@ Table of Contents
   2.14	/proc/<pid>/io - Display the IO accounting fields
   2.15	/proc/<pid>/coredump_filter - Core dump filtering settings
   2.16	/proc/<pid>/mountinfo - Information about mounts
+  2.17	/proc/sys/fs/types - File system type specific parameters
 
 ------------------------------------------------------------------------------
 Preface
@@ -2392,4 +2393,34 @@ For more information see:
 
   Documentation/filesystems/sharedsubtree.txt
 
+2.17 /proc/sys/fs/types/ - File system type specific parameters
+----------------------------------------------------------------
+
+There's a separate directory /proc/sys/fs/types/<type>/ for each
+filesystem type, containing the following files:
+
+usermount_safe
+--------------
+
+Setting this to non-zero will allow filesystems of this type to be
+mounted by unprivileged users (note, that there are other
+prerequisites as well).
+
+Fuse has been designed to be as safe as possible, and some
+distributions already ship with unprivileged fuse mounts enabled by
+default.  There are still some situations (multi-user systems with
+untrusted users in particular), where enabling this for fuse might not
+be appropriate.  For more details, see Documentation/filesystems/fuse.txt
+
+Procfs is also safe, but unprivileged mounting of it is not usually
+necessary (bind mounting is equivalent).
+
+Most other filesystems are unsafe.  Here are just some of the issues,
+that must be resolved before a filesystem can be declared safe:
+
+ - no strict input checking (buffer overruns, directory loops, etc)
+ - network filesystem deadlocks when mounting from localhost
+ - no permission checking when opening the device
+ - changing mount options when mounting a new instance of a filesystem
+
 ------------------------------------------------------------------------------

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 08/11] unprivileged mounts: make fuse safe
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (6 preceding siblings ...)
  2008-03-17 20:01 ` [patch 07/11] unprivileged mounts: add sysctl tunable for "safe" property Miklos Szeredi
@ 2008-03-17 20:01 ` Miklos Szeredi
  2008-03-17 20:01 ` [patch 09/11] unprivileged mounts: propagation: inherit owner from parent Miklos Szeredi
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:01 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-allow-unprivileged-fuse-mounts.patch --]
[-- Type: text/plain, Size: 6459 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Don't require the "user_id=" and "group_id=" options for unprivileged mounts,
but if they are present, verify them for sanity.

Disallow the "allow_other" option for unprivileged mounts.

Document new way of enabling unprivileged mounts for fuse.

Document problems with unprivileged mounts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 Documentation/filesystems/fuse.txt |   88 ++++++++++++++++++++++++++++++++++---
 fs/fuse/inode.c                    |   21 ++++++++
 2 files changed, 103 insertions(+), 6 deletions(-)

Index: linux/fs/fuse/inode.c
===================================================================
--- linux.orig/fs/fuse/inode.c	2008-03-17 20:55:30.000000000 +0100
+++ linux/fs/fuse/inode.c	2008-03-17 20:55:51.000000000 +0100
@@ -373,6 +373,19 @@ static int parse_fuse_opt(char *opt, str
 	d->max_read = ~0;
 	d->blksize = FUSE_DEFAULT_BLKSIZE;
 
+	/*
+	 * For unprivileged mounts use current uid/gid.  Still allow
+	 * "user_id" and "group_id" options for compatibility, but
+	 * only if they match these values.
+	 */
+	if (!capable(CAP_SYS_ADMIN)) {
+		d->user_id = current->uid;
+		d->user_id_present = 1;
+		d->group_id = current->gid;
+		d->group_id_present = 1;
+
+	}
+
 	while ((p = strsep(&opt, ",")) != NULL) {
 		int token;
 		int value;
@@ -401,6 +414,8 @@ static int parse_fuse_opt(char *opt, str
 		case OPT_USER_ID:
 			if (match_int(&args[0], &value))
 				return 0;
+			if (d->user_id_present && d->user_id != value)
+				return 0;
 			d->user_id = value;
 			d->user_id_present = 1;
 			break;
@@ -408,6 +423,8 @@ static int parse_fuse_opt(char *opt, str
 		case OPT_GROUP_ID:
 			if (match_int(&args[0], &value))
 				return 0;
+			if (d->group_id_present && d->group_id != value)
+				return 0;
 			d->group_id = value;
 			d->group_id_present = 1;
 			break;
@@ -630,6 +647,10 @@ static int fuse_fill_super(struct super_
 	if (!parse_fuse_opt((char *) data, &d, is_bdev))
 		return -EINVAL;
 
+	/* This is a privileged option */
+	if ((d.flags & FUSE_ALLOW_OTHER) && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	if (is_bdev) {
 #ifdef CONFIG_BLOCK
 		if (!sb_set_blocksize(sb, d.blksize))
Index: linux/Documentation/filesystems/fuse.txt
===================================================================
--- linux.orig/Documentation/filesystems/fuse.txt	2008-03-17 20:55:30.000000000 +0100
+++ linux/Documentation/filesystems/fuse.txt	2008-03-17 20:55:51.000000000 +0100
@@ -215,11 +215,87 @@ the filesystem.  There are several ways 
   - Abort filesystem through the FUSE control filesystem.  Most
     powerful method, always works.
 
-How do non-privileged mounts work?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Unprivileged fuse mounts
+~~~~~~~~~~~~~~~~~~~~~~~~
 
-Since the mount() system call is a privileged operation, a helper
-program (fusermount) is needed, which is installed setuid root.
+Possible problems with unprivileged fuse mounts
+-----------------------------------------------
+
+FUSE was designed from the beginning to be safe for unprivileged
+users.  This has also been verified in practice over many years, with
+some distributions enabling unprivileged FUSE mounts by default.
+
+However, there are cases when unprivileged mounting a fuse filesystem
+may be problematic, particularly for multi-user systems with untrusted
+users.  So here are few words of warning:
+
+Due to the design of the process freezer, a hanging (due to network
+problems, etc) or malicious filesystem may prevent suspending to ram
+or hibernation to succeed.  This is not actually unique to FUSE, as
+any hanging network filesystem will have the same affect.
+
+It is not always possible to use kill(2) (not even with SIGKILL) to
+terminate a process using a FUSE filesystem (see section "Interrupting
+filesystem operations" above).  As a special case of the above,
+killing a self-deadlocked FUSE process is not possible, and even
+killall5 will not terminate it.
+
+If the above could pose a threat to the system, it is recommended,
+that unprivileged fuse mounts are not enabled.
+
+Ways of enabling user mounts
+----------------------------
+
+Now there are two different ways of allowing unprivileged fuse mounts:
+
+ 1) new way: unprivileged mount syscall
+
+ 2) old way: suid-root fusermount utility
+
+Unprivileged mount syscall
+--------------------------
+
+To enable this do
+
+  echo 1 > /proc/sys/fs/types/fuse/usermount_safe
+
+or add this line to /etc/sysctl.conf:
+
+  fs.types.fuse.usermount_safe = 1
+
+More information can be found in Documentation/filesystems/proc.txt
+under the /proc/sys/fs/types/ heading.  Also see description of
+nr_user_mounts and max_user_mounts under /proc/sys/fs.
+
+This doesn't in itself allow users to create mounts, first root needs
+to create a mount owned by the user, under which the user can create
+submounts.
+
+For example to enable submounts under /home/xyz/mnt do:
+
+  mount --bind -ouser=xyz /home/xyz/mnt /home/xyz/mnt
+
+or add this line to /etc/fstab:
+
+  /home/xyz/mnt  /home/xyz/mnt  none  bind,user=xyz  0 0
+
+And finally, make sure, that the user has read and write permissions
+on /dev/fuse (installing fuse should have already taken care of this):
+
+  chmod 0666 /dev/fuse
+
+or create a file under /etc/udev/rules.d/ containing:
+
+  KERNEL=="fuse", MODE="0666"
+
+After this, mounting fuse filesystems under ~xyz/mnt should work, even
+if fusermount is not installed setuid-root.
+
+Suid-root fusermount utility
+----------------------------
+
+[Some of the details described here apply to the new, unprivileged
+mount system call as well].
 
 The implication of providing non-privileged mounts is that the mount
 owner must not be able to use this capability to compromise the
@@ -235,7 +311,7 @@ system.  Obvious requirements arising fr
     other users' or the super user's processes
 
 How are requirements fulfilled?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- - - - - - - - - - - - - - - -
 
  A) The mount owner could gain elevated privileges by either:
 
@@ -300,7 +376,7 @@ How are requirements fulfilled?
 	filesystem, since SIGSTOP can be used to get a similar effect.
 
 I think these limitations are unacceptable?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- - - - - - - - - - - - - - - - - - - - - -
 
 If a sysadmin trusts the users enough, or can ensure through other
 measures, that system processes will never enter non-privileged

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 09/11] unprivileged mounts: propagation: inherit owner from parent
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (7 preceding siblings ...)
  2008-03-17 20:01 ` [patch 08/11] unprivileged mounts: make fuse safe Miklos Szeredi
@ 2008-03-17 20:01 ` Miklos Szeredi
  2008-03-17 20:01 ` [patch 10/11] unprivileged mounts: add "no submounts" flag Miklos Szeredi
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:01 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-propagation-inherit-owner-from-parent.patch --]
[-- Type: text/plain, Size: 7690 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

On mount propagation, let the owner of the clone be inherited from the
parent into which it has been propagated.

If the parent has the "nosuid" flag, set this flag for the child as
well.  This is needed for the suid-less namespace (use case #2 in the
first patch header), where all mounts are owned by the user and have
the nosuid flag set.  In this case the propagated mount needs to have
nosuid, otherwise a suid executable may be misused by the user.

Similar treatment is not needed for "nodev", because devices can't be
abused this way: the user is not able to gain privileges to devices by
rearranging the mount namespace.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 fs/namespace.c     |   40 +++++++++++++++++++++++++---------------
 fs/pnode.c         |   19 ++++++++++++++++---
 fs/pnode.h         |    2 ++
 include/linux/fs.h |    1 -
 4 files changed, 43 insertions(+), 19 deletions(-)

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:49.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:52.000000000 +0100
@@ -550,10 +550,10 @@ static int reserve_user_mount(void)
 	return err;
 }
 
-static void __set_mnt_user(struct vfsmount *mnt)
+static void __set_mnt_user(struct vfsmount *mnt, uid_t owner)
 {
 	WARN_ON(mnt->mnt_flags & MNT_USER);
-	mnt->mnt_uid = current->fsuid;
+	mnt->mnt_uid = owner;
 	mnt->mnt_flags |= MNT_USER;
 
 	if (!capable(CAP_SETUID))
@@ -564,7 +564,7 @@ static void __set_mnt_user(struct vfsmou
 
 static void set_mnt_user(struct vfsmount *mnt)
 {
-	__set_mnt_user(mnt);
+	__set_mnt_user(mnt, current->fsuid);
 	spin_lock(&vfsmount_lock);
 	nr_user_mounts++;
 	spin_unlock(&vfsmount_lock);
@@ -580,7 +580,7 @@ static void clear_mnt_user(struct vfsmou
 }
 
 static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
-					int flag)
+					int flag, uid_t owner)
 {
 	struct super_block *sb = old->mnt_sb;
 	struct vfsmount *mnt;
@@ -604,7 +604,10 @@ static struct vfsmount *clone_mnt(struct
 	/* don't copy the MNT_USER flag */
 	mnt->mnt_flags &= ~MNT_USER;
 	if (flag & CL_SETUSER)
-		__set_mnt_user(mnt);
+		__set_mnt_user(mnt, owner);
+
+	if (flag & CL_NOSUID)
+		mnt->mnt_flags |= MNT_NOSUID;
 
 	if (flag & CL_SLAVE) {
 		list_add(&mnt->mnt_slave, &old->mnt_slave_list);
@@ -1246,7 +1249,7 @@ static int lives_below_in_same_fs(struct
 }
 
 struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
-					int flag)
+					int flag, uid_t owner)
 {
 	struct vfsmount *res, *p, *q, *r, *s;
 	struct nameidata nd;
@@ -1254,7 +1257,7 @@ struct vfsmount *copy_tree(struct vfsmou
 	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
 		return ERR_PTR(-EPERM);
 
-	res = q = clone_mnt(mnt, dentry, flag);
+	res = q = clone_mnt(mnt, dentry, flag, owner);
 	if (IS_ERR(q))
 		goto error;
 	q->mnt_mountpoint = mnt->mnt_mountpoint;
@@ -1276,7 +1279,7 @@ struct vfsmount *copy_tree(struct vfsmou
 			p = s;
 			nd.path.mnt = q;
 			nd.path.dentry = p->mnt_mountpoint;
-			q = clone_mnt(p, p->mnt_root, flag);
+			q = clone_mnt(p, p->mnt_root, flag, owner);
 			if (IS_ERR(q))
 				goto error;
 			spin_lock(&vfsmount_lock);
@@ -1301,7 +1304,7 @@ struct vfsmount *collect_mounts(struct v
 {
 	struct vfsmount *tree;
 	down_read(&namespace_sem);
-	tree = copy_tree(mnt, dentry, CL_COPY_ALL | CL_PRIVATE);
+	tree = copy_tree(mnt, dentry, CL_COPY_ALL | CL_PRIVATE, 0);
 	up_read(&namespace_sem);
 	return tree;
 }
@@ -1475,7 +1478,8 @@ static noinline int do_change_type(struc
 static noinline int do_loopback(struct nameidata *nd, char *old_name,
 				int flags)
 {
-	int clone_fl;
+	int clone_fl = 0;
+	uid_t owner = 0;
 	struct nameidata old_nd;
 	struct vfsmount *mnt = NULL;
 	int err;
@@ -1496,11 +1500,17 @@ static noinline int do_loopback(struct n
 	if (!check_mnt(nd->path.mnt) || !check_mnt(old_nd.path.mnt))
 		goto out;
 
-	clone_fl = (flags & MS_SETUSER) ? CL_SETUSER : 0;
+	if (flags & MS_SETUSER) {
+		clone_fl |= CL_SETUSER;
+		owner = current->fsuid;
+	}
+
 	if (flags & MS_REC)
-		mnt = copy_tree(old_nd.path.mnt, old_nd.path.dentry, clone_fl);
+		mnt = copy_tree(old_nd.path.mnt, old_nd.path.dentry, clone_fl,
+				owner);
 	else
-		mnt = clone_mnt(old_nd.path.mnt, old_nd.path.dentry, clone_fl);
+		mnt = clone_mnt(old_nd.path.mnt, old_nd.path.dentry, clone_fl,
+				owner);
 
 	err = PTR_ERR(mnt);
 	if (IS_ERR(mnt))
@@ -1729,7 +1739,7 @@ static noinline int do_new_mount(struct 
 	}
 
 	if (flags & MS_SETUSER)
-		__set_mnt_user(mnt);
+		__set_mnt_user(mnt, current->fsuid);
 
 	return do_add_mount(mnt, nd, mnt_flags, NULL);
 
@@ -2125,7 +2135,7 @@ static struct mnt_namespace *dup_mnt_ns(
 	down_write(&namespace_sem);
 	/* First pass: copy the tree topology */
 	new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root,
-					CL_COPY_ALL | CL_EXPIRE);
+					CL_COPY_ALL | CL_EXPIRE, 0);
 	if (IS_ERR(new_ns->root)) {
 		up_write(&namespace_sem);
 		kfree(new_ns);
Index: linux/fs/pnode.c
===================================================================
--- linux.orig/fs/pnode.c	2008-03-17 20:55:45.000000000 +0100
+++ linux/fs/pnode.c	2008-03-17 20:55:52.000000000 +0100
@@ -298,15 +298,28 @@ int propagate_mnt(struct vfsmount *dest_
 
 	for (m = propagation_next(dest_mnt, dest_mnt); m;
 			m = propagation_next(m, dest_mnt)) {
-		int type;
+		int clflags;
+		uid_t owner = 0;
 		struct vfsmount *source;
 
 		if (IS_MNT_NEW(m))
 			continue;
 
-		source =  get_source(m, prev_dest_mnt, prev_src_mnt, &type);
+		source =  get_source(m, prev_dest_mnt, prev_src_mnt, &clflags);
 
-		child = copy_tree(source, source->mnt_root, type);
+		if (m->mnt_flags & MNT_USER) {
+			clflags |= CL_SETUSER;
+			owner = m->mnt_uid;
+
+			/*
+			 * If propagating into a user mount which doesn't
+			 * allow suid, then make sure, the child(ren) won't
+			 * allow suid either
+			 */
+			if (m->mnt_flags & MNT_NOSUID)
+				clflags |= CL_NOSUID;
+		}
+		child = copy_tree(source, source->mnt_root, clflags, owner);
 		if (IS_ERR(child)) {
 			ret = PTR_ERR(child);
 			list_splice(tree_list, tmp_list.prev);
Index: linux/fs/pnode.h
===================================================================
--- linux.orig/fs/pnode.h	2008-03-17 20:55:42.000000000 +0100
+++ linux/fs/pnode.h	2008-03-17 20:55:52.000000000 +0100
@@ -23,6 +23,7 @@
 #define CL_PROPAGATION 		0x10
 #define CL_PRIVATE 		0x20
 #define CL_SETUSER		0x40
+#define CL_NOSUID		0x80
 
 void set_mnt_shared(struct vfsmount *);
 void clear_mnt_shared(struct vfsmount *);
@@ -32,6 +33,7 @@ int propagate_mnt(struct vfsmount *, str
 		struct list_head *);
 int propagate_umount(struct list_head *);
 int propagate_mount_busy(struct vfsmount *, int);
+struct vfsmount *copy_tree(struct vfsmount *, struct dentry *, int, uid_t);
 
 int get_peer_group_id(struct vfsmount *);
 int get_master_group_id(struct vfsmount *);
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2008-03-17 20:55:50.000000000 +0100
+++ linux/include/linux/fs.h	2008-03-17 20:55:52.000000000 +0100
@@ -1546,7 +1546,6 @@ extern int may_umount(struct vfsmount *)
 extern void umount_tree(struct vfsmount *, int, struct list_head *);
 extern void release_mounts(struct list_head *);
 extern long do_mount(char *, char *, char *, unsigned long, void *);
-extern struct vfsmount *copy_tree(struct vfsmount *, struct dentry *, int);
 extern void mnt_set_mountpoint(struct vfsmount *, struct dentry *,
 				  struct vfsmount *);
 extern struct vfsmount *collect_mounts(struct vfsmount *, struct dentry *);

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 10/11] unprivileged mounts: add "no submounts" flag
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (8 preceding siblings ...)
  2008-03-17 20:01 ` [patch 09/11] unprivileged mounts: propagation: inherit owner from parent Miklos Szeredi
@ 2008-03-17 20:01 ` Miklos Szeredi
  2008-03-17 20:01 ` [patch 11/11] unprivileged mounts: copy mount ownership on namespace cloning Miklos Szeredi
  2008-03-17 22:51 ` [patch 00/11] mount ownership and unprivileged mount syscall (v9) James Morris
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:01 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-add-no-submounts-flag.patch --]
[-- Type: text/plain, Size: 3089 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add a new mount flag "nosubmnt", which denies submounts for the owner.
This would be useful, if we want to support traditional /etc/fstab
based user mounts.

In this case mount(8) would still have to be suid-root, to check the
mountpoint against the user/users flag in /etc/fstab, but /etc/mtab
would no longer be mandatory for storing the actual owner of the
mount.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 fs/namespace.c        |   10 ++++++++--
 include/linux/fs.h    |    1 +
 include/linux/mount.h |    1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:52.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:53.000000000 +0100
@@ -810,6 +810,7 @@ static void show_mnt_opts(struct seq_fil
 		{ MNT_NOATIME, ",noatime" },
 		{ MNT_NODIRATIME, ",nodiratime" },
 		{ MNT_RELATIME, ",relatime" },
+		{ MNT_NOSUBMNT, ",nosubmnt" },
 		{ 0, NULL }
 	};
 	const struct proc_fs_info *fs_infop;
@@ -1230,6 +1231,9 @@ static bool permit_mount(struct nameidat
 	if (S_ISLNK(inode->i_mode))
 		return false;
 
+	if (nd->path.mnt->mnt_flags & MNT_NOSUBMNT)
+		return false;
+
 	if (!is_mount_owner(nd->path.mnt, current->fsuid))
 		return false;
 
@@ -2082,9 +2086,11 @@ long do_mount(char *dev_name, char *dir_
 		mnt_flags |= MNT_RELATIME;
 	if (flags & MS_RDONLY)
 		mnt_flags |= MNT_READONLY;
+	if (flags & MS_NOSUBMNT)
+		mnt_flags |= MNT_NOSUBMNT;
 
-	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
-		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT);
+	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_NOATIME |
+		   MS_NODIRATIME | MS_RELATIME | MS_KERNMOUNT | MS_NOSUBMNT);
 
 	/* ... and get the mountpoint */
 	retval = path_lookup(dir_name, LOOKUP_FOLLOW, &nd);
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2008-03-17 20:55:52.000000000 +0100
+++ linux/include/linux/fs.h	2008-03-17 20:55:53.000000000 +0100
@@ -129,6 +129,7 @@ extern int dir_notify_enable;
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_SETUSER	(1<<24) /* set mnt_uid to current user */
+#define MS_NOSUBMNT	(1<<25) /* don't allow unprivileged submounts */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
 
Index: linux/include/linux/mount.h
===================================================================
--- linux.orig/include/linux/mount.h	2008-03-17 20:55:42.000000000 +0100
+++ linux/include/linux/mount.h	2008-03-17 20:55:53.000000000 +0100
@@ -30,6 +30,7 @@ struct mnt_namespace;
 #define MNT_NODIRATIME	0x10
 #define MNT_RELATIME	0x20
 #define MNT_READONLY	0x40	/* does the user want this to be r/o? */
+#define MNT_NOSUBMNT	0x80
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_IMBALANCED_WRITE_COUNT	0x200 /* just for debugging */

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 11/11] unprivileged mounts: copy mount ownership on namespace cloning
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (9 preceding siblings ...)
  2008-03-17 20:01 ` [patch 10/11] unprivileged mounts: add "no submounts" flag Miklos Szeredi
@ 2008-03-17 20:01 ` Miklos Szeredi
  2008-03-17 22:51 ` [patch 00/11] mount ownership and unprivileged mount syscall (v9) James Morris
  11 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-17 20:01 UTC (permalink / raw)
  To: akpm, hch, serue, viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: unprivileged-mounts-clone-inherit-owner.patch --]
[-- Type: text/plain, Size: 1595 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Mount ownership wasn't copied on CLONE_NEWNS.  Noticed by Al Viro.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c |    7 ++++++-
 fs/pnode.h     |    1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2008-03-17 20:55:53.000000000 +0100
+++ linux/fs/namespace.c	2008-03-17 20:55:53.000000000 +0100
@@ -585,6 +585,11 @@ static struct vfsmount *clone_mnt(struct
 	struct super_block *sb = old->mnt_sb;
 	struct vfsmount *mnt;
 
+	if ((flag & CL_COPYUSER) && (old->mnt_flags & MNT_USER)) {
+		owner = old->mnt_uid;
+		flag |= CL_SETUSER;
+	}
+
 	if (flag & CL_SETUSER) {
 		int err = reserve_user_mount();
 		if (err)
@@ -2141,7 +2146,7 @@ static struct mnt_namespace *dup_mnt_ns(
 	down_write(&namespace_sem);
 	/* First pass: copy the tree topology */
 	new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root,
-					CL_COPY_ALL | CL_EXPIRE, 0);
+				 CL_COPY_ALL | CL_EXPIRE | CL_COPYUSER, 0);
 	if (IS_ERR(new_ns->root)) {
 		up_write(&namespace_sem);
 		kfree(new_ns);
Index: linux/fs/pnode.h
===================================================================
--- linux.orig/fs/pnode.h	2008-03-17 20:55:52.000000000 +0100
+++ linux/fs/pnode.h	2008-03-17 20:55:53.000000000 +0100
@@ -24,6 +24,7 @@
 #define CL_PRIVATE 		0x20
 #define CL_SETUSER		0x40
 #define CL_NOSUID		0x80
+#define CL_COPYUSER		0x100
 
 void set_mnt_shared(struct vfsmount *);
 void clear_mnt_shared(struct vfsmount *);

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 00/11] mount ownership and unprivileged mount syscall (v9)
  2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
                   ` (10 preceding siblings ...)
  2008-03-17 20:01 ` [patch 11/11] unprivileged mounts: copy mount ownership on namespace cloning Miklos Szeredi
@ 2008-03-17 22:51 ` James Morris
  2008-03-18 11:33   ` Miklos Szeredi
  11 siblings, 1 reply; 15+ messages in thread
From: James Morris @ 2008-03-17 22:51 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Morton, Christoph Hellwig, serue, viro, linux-fsdevel,
	linux-kernel, Stephen Smalley, Eric Paris, linux-security-module

Something to consider down the track would be how to possibly allow this 
with SELinux, which only knows about normal mounts.

We might need a user_mount hook which is called once the core kernel code 
determines that it is a a valid unprivileged mount (although the sb_mount 
hook will already have been called, IIUC).


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 00/11] mount ownership and unprivileged mount syscall (v9)
  2008-03-17 22:51 ` [patch 00/11] mount ownership and unprivileged mount syscall (v9) James Morris
@ 2008-03-18 11:33   ` Miklos Szeredi
  2008-03-18 23:04     ` James Morris
  0 siblings, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2008-03-18 11:33 UTC (permalink / raw)
  To: jmorris
  Cc: miklos, akpm, hch, serue, viro, linux-fsdevel, linux-kernel, sds,
	eparis, linux-security-module

> Something to consider down the track would be how to possibly allow this 
> with SELinux, which only knows about normal mounts.

Right.

> We might need a user_mount hook which is called once the core kernel code 
> determines that it is a a valid unprivileged mount (although the sb_mount 
> hook will already have been called, IIUC).

Does the order matter between core code's and the security module's
permission checks?  If it does, the cleanest would be to just move the
core checks before the sb_mount hook, no?

Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 00/11] mount ownership and unprivileged mount syscall (v9)
  2008-03-18 11:33   ` Miklos Szeredi
@ 2008-03-18 23:04     ` James Morris
  0 siblings, 0 replies; 15+ messages in thread
From: James Morris @ 2008-03-18 23:04 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, hch, serue, viro, linux-fsdevel, linux-kernel, sds, eparis,
	linux-security-module

On Tue, 18 Mar 2008, Miklos Szeredi wrote:

> > We might need a user_mount hook which is called once the core kernel code 
> > determines that it is a a valid unprivileged mount (although the sb_mount 
> > hook will already have been called, IIUC).
> 
> Does the order matter between core code's and the security module's
> permission checks?

Yes, the model is DAC before MAC.

>  If it does, the cleanest would be to just move the
> core checks before the sb_mount hook, no?

Correct.

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-03-19 21:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-17 20:00 [patch 00/11] mount ownership and unprivileged mount syscall (v9) Miklos Szeredi
2008-03-17 20:00 ` [patch 01/11] unprivileged mounts: add user mounts to the kernel Miklos Szeredi
2008-03-17 20:00 ` [patch 02/11] unprivileged mounts: allow unprivileged umount Miklos Szeredi
2008-03-17 20:00 ` [patch 03/11] unprivileged mounts: propagate error values from clone_mnt Miklos Szeredi
2008-03-17 20:00 ` [patch 04/11] unprivileged mounts: account user mounts Miklos Szeredi
2008-03-17 20:00 ` [patch 05/11] unprivileged mounts: allow unprivileged bind mounts Miklos Szeredi
2008-03-17 20:00 ` [patch 06/11] unprivileged mounts: allow unprivileged mounts Miklos Szeredi
2008-03-17 20:01 ` [patch 07/11] unprivileged mounts: add sysctl tunable for "safe" property Miklos Szeredi
2008-03-17 20:01 ` [patch 08/11] unprivileged mounts: make fuse safe Miklos Szeredi
2008-03-17 20:01 ` [patch 09/11] unprivileged mounts: propagation: inherit owner from parent Miklos Szeredi
2008-03-17 20:01 ` [patch 10/11] unprivileged mounts: add "no submounts" flag Miklos Szeredi
2008-03-17 20:01 ` [patch 11/11] unprivileged mounts: copy mount ownership on namespace cloning Miklos Szeredi
2008-03-17 22:51 ` [patch 00/11] mount ownership and unprivileged mount syscall (v9) James Morris
2008-03-18 11:33   ` Miklos Szeredi
2008-03-18 23:04     ` James Morris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox