[patch 0/8] unprivileged mount syscall

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 0/8] unprivileged mount syscall
@ 2007-04-04 18:30 Miklos Szeredi
  2007-04-04 18:30 ` [patch 1/8] add user mounts to the kernel Miklos Szeredi
                   ` (9 more replies)
  0 siblings, 10 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

This patchset adds support for keeping mount ownership information in
the kernel, and allow unprivileged mount(2) and umount(2) in certain
cases.

This can be useful for the following reasons:

- mount(8) can store ownership ("user=XY" option) in the kernel
  instead, or in addition to storing it in /etc/mtab.  For example if
  private namespaces are used with mount propagations /etc/mtab
  becomes unworkable, but using /proc/mounts works fine

- fuse won't need a special suid-root mount/umount utility.  Plain
  umount(8) can easily be made to work with unprivileged fuse mounts

- users can use bind mounts without having to pre-configure them in
  /etc/fstab

All this is done in a secure way, and unprivileged bind and fuse
mounts are disabled by default and can be enabled through sysctl or
/proc/sys.

One thing that is missing from this series is the ability to restrict
user mounts to private namespaces.  The reason is that private
namespaces have still not gained the momentum and support needed for
painless user experience.  So such a feature would not yet get enough
attention and testing.  However adding such an optional restriction
can be done with minimal changes in the future, once private
namespaces have matured.

An earlier version of these patches have been discussed here:

  http://lkml.org/lkml/2005/5/3/64

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch 1/8] add user mounts to the kernel
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
@ 2007-04-04 18:30 ` Miklos Szeredi
  2007-04-04 18:30 ` [patch 2/8] allow unprivileged umount Miklos Szeredi
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

[-- Attachment #1: mount_owner.patch --]
[-- Type: text/plain, Size: 6257 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add ownership information to mounts.

A new mount flag, MS_SETUSER is used to make a mount owned by a user.
If this flag is specified, then the owner will be set to the current
real user id and the mount will be maked with the MNT_USER flag.  On
remount don't preserve previous onwner, and treat MS_SETUSER as for a
new mount.  The MS_SETUSER flag is ignored on mount move.

The MNT_USER flag is not copied on any kind of mount cloning:
namespace creation, binding or propagation.  For bind mounts the
cloned mount(s) are set to MNT_USER depending on the MS_SETUSER mount
flag.  In all the other cases MNT_USER is always cleared.

For MNT_USER mounts a "user=UID" option is added to /proc/PID/mounts.
This is compatible with how mount ownership is stored in /etc/mtab.

It is expected, that in the future mount(8) will use MS_SETUSER to
store mount ownership within the kernel.  This would help in
situations, where /etc/mtab is difficult or impossible to work with,
e.g. when using mount propagation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2007-04-04 19:29:41.000000000 +0200
+++ linux/fs/namespace.c	2007-04-04 19:29:47.000000000 +0200
@@ -227,6 +227,13 @@ static struct vfsmount *skip_mnt_tree(st
 	return p;
 }
 
+static void set_mnt_user(struct vfsmount *mnt)
+{
+	BUG_ON(mnt->mnt_flags & MNT_USER);
+	mnt->mnt_uid = current->uid;
+	mnt->mnt_flags |= MNT_USER;
+}
+
 static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
 					int flag)
 {
@@ -241,6 +248,11 @@ static struct vfsmount *clone_mnt(struct
 		mnt->mnt_mountpoint = mnt->mnt_root;
 		mnt->mnt_parent = mnt;
 
+		/* don't copy the MNT_USER flag */
+		mnt->mnt_flags &= ~MNT_USER;
+		if (flag & CL_SETUSER)
+			set_mnt_user(mnt);
+
 		if (flag & CL_SLAVE) {
 			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
 			mnt->mnt_master = old;
@@ -390,6 +402,8 @@ static int show_vfsmnt(struct seq_file *
 		if (mnt->mnt_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
 	}
+	if (mnt->mnt_flags & MNT_USER)
+		seq_printf(m, ",user=%i", mnt->mnt_uid);
 	if (mnt->mnt_sb->s_op->show_options)
 		err = mnt->mnt_sb->s_op->show_options(m, mnt);
 	seq_puts(m, " 0 0\n");
@@ -901,8 +915,9 @@ static int do_change_type(struct nameida
 /*
  * do loopback mount.
  */
-static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
+static int do_loopback(struct nameidata *nd, char *old_name, int flags)
 {
+	int clone_flags;
 	struct nameidata old_nd;
 	struct vfsmount *mnt = NULL;
 	int err = mount_is_safe(nd);
@@ -922,11 +937,12 @@ static int do_loopback(struct nameidata 
 	if (!check_mnt(nd->mnt) || !check_mnt(old_nd.mnt))
 		goto out;
 
+	clone_flags = (flags & MS_SETUSER) ? CL_SETUSER : 0;
 	err = -ENOMEM;
-	if (recurse)
-		mnt = copy_tree(old_nd.mnt, old_nd.dentry, 0);
+	if (flags & MS_REC)
+		mnt = copy_tree(old_nd.mnt, old_nd.dentry, clone_flags);
 	else
-		mnt = clone_mnt(old_nd.mnt, old_nd.dentry, 0);
+		mnt = clone_mnt(old_nd.mnt, old_nd.dentry, clone_flags);
 
 	if (!mnt)
 		goto out;
@@ -968,8 +984,11 @@ static int do_remount(struct nameidata *
 
 	down_write(&sb->s_umount);
 	err = do_remount_sb(sb, flags, data, 0);
-	if (!err)
+	if (!err) {
 		nd->mnt->mnt_flags = mnt_flags;
+		if (flags & MS_SETUSER)
+			set_mnt_user(nd->mnt);
+	}
 	up_write(&sb->s_umount);
 	if (!err)
 		security_sb_post_remount(nd->mnt, flags, data);
@@ -1074,10 +1093,13 @@ static int do_new_mount(struct nameidata
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	mnt = do_kern_mount(type, flags, name, data);
+	mnt = do_kern_mount(type, flags & ~MS_SETUSER, name, data);
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
+	if (flags & MS_SETUSER)
+		set_mnt_user(mnt);
+
 	return do_add_mount(mnt, nd, mnt_flags, NULL);
 }
 
@@ -1108,7 +1130,8 @@ int do_add_mount(struct vfsmount *newmnt
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
-	newmnt->mnt_flags = mnt_flags;
+	/* MNT_USER was set earlier */
+	newmnt->mnt_flags |= mnt_flags;
 	if ((err = graft_tree(newmnt, nd)))
 		goto unlock;
 
@@ -1428,7 +1451,7 @@ long do_mount(char *dev_name, char *dir_
 		retval = do_remount(&nd, flags & ~MS_REMOUNT, mnt_flags,
 				    data_page);
 	else if (flags & MS_BIND)
-		retval = do_loopback(&nd, dev_name, flags & MS_REC);
+		retval = do_loopback(&nd, dev_name, flags);
 	else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
 		retval = do_change_type(&nd, flags);
 	else if (flags & MS_MOVE)
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2007-04-04 19:29:41.000000000 +0200
+++ linux/include/linux/fs.h	2007-04-04 19:29:47.000000000 +0200
@@ -122,6 +122,7 @@ extern int dir_notify_enable;
 #define MS_SLAVE	(1<<19)	/* change to slave */
 #define MS_SHARED	(1<<20)	/* change to shared */
 #define MS_RELATIME	(1<<21)	/* Update atime relative to mtime/ctime. */
+#define MS_SETUSER	(1<<22) /* set mnt_uid to current user */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
 
Index: linux/include/linux/mount.h
===================================================================
--- linux.orig/include/linux/mount.h	2007-04-04 19:27:47.000000000 +0200
+++ linux/include/linux/mount.h	2007-04-04 19:29:47.000000000 +0200
@@ -28,6 +28,7 @@ struct mnt_namespace;
 #define MNT_NOATIME	0x08
 #define MNT_NODIRATIME	0x10
 #define MNT_RELATIME	0x20
+#define MNT_USER	0x40
 
 #define MNT_SHRINKABLE	0x100
 
@@ -61,6 +62,8 @@ struct vfsmount {
 	atomic_t mnt_count;
 	int mnt_expiry_mark;		/* true if marked for expiry */
 	int mnt_pinned;
+
+	uid_t mnt_uid;			/* owner of the mount */
 };
 
 static inline struct vfsmount *mntget(struct vfsmount *mnt)
Index: linux/fs/pnode.h
===================================================================
--- linux.orig/fs/pnode.h	2007-04-04 19:27:47.000000000 +0200
+++ linux/fs/pnode.h	2007-04-04 19:29:47.000000000 +0200
@@ -22,6 +22,7 @@
 #define CL_COPY_ALL 		0x04
 #define CL_MAKE_SHARED 		0x08
 #define CL_PROPAGATION 		0x10
+#define CL_SETUSER		0x20
 
 static inline void set_mnt_shared(struct vfsmount *mnt)
 {

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch 2/8] allow unprivileged umount
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
  2007-04-04 18:30 ` [patch 1/8] add user mounts to the kernel Miklos Szeredi
@ 2007-04-04 18:30 ` Miklos Szeredi
  2007-04-04 18:30 ` [patch 3/8] account user mounts Miklos Szeredi
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

[-- Attachment #1: unprivileged_umount.patch --]
[-- Type: text/plain, Size: 1373 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

The owner doesn't need sysadmin capabilities to call umount().

Similar behavior as umount(8) on mounts having "user=UID" option in
/etc/mtab.  The difference is that umount also checks /etc/fstab,
presumably to exclude another mount on the same mountpoint.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2007-04-04 19:29:47.000000000 +0200
+++ linux/fs/namespace.c	2007-04-04 19:29:54.000000000 +0200
@@ -640,6 +640,25 @@ static int do_umount(struct vfsmount *mn
 }
 
 /*
+ * umount is permitted for
+ *  - sysadmin
+ *  - mount owner, if not forced umount
+ */
+static bool permit_umount(struct vfsmount *mnt, int flags)
+{
+	if (capable(CAP_SYS_ADMIN))
+		return true;
+
+	if (!(mnt->mnt_flags & MNT_USER))
+		return false;
+
+	if (flags & MNT_FORCE)
+		return false;
+
+	return mnt->mnt_uid == current->uid;
+}
+
+/*
  * Now umount can handle mount points as well as block devices.
  * This is important for filesystems which use unnamed block devices.
  *
@@ -662,7 +681,7 @@ asmlinkage long sys_umount(char __user *
 		goto dput_and_out;
 
 	retval = -EPERM;
-	if (!capable(CAP_SYS_ADMIN))
+	if (!permit_umount(nd.mnt, flags))
 		goto dput_and_out;
 
 	retval = do_umount(nd.mnt, flags);

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch 3/8] account user mounts
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
  2007-04-04 18:30 ` [patch 1/8] add user mounts to the kernel Miklos Szeredi
  2007-04-04 18:30 ` [patch 2/8] allow unprivileged umount Miklos Szeredi
@ 2007-04-04 18:30 ` Miklos Szeredi
  2007-04-04 18:30 ` [patch 4/8] propagate error values from clone_mnt Miklos Szeredi
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

[-- Attachment #1: account_user_mounts.patch --]
[-- Type: text/plain, Size: 4851 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add sysctl variables for accounting and limiting the number of user
mounts.

The maximum number of user mounts is set to zero by default.  This
matches the behavior of previous kernels.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/include/linux/sysctl.h
===================================================================
--- linux.orig/include/linux/sysctl.h	2007-04-04 19:28:03.000000000 +0200
+++ linux/include/linux/sysctl.h	2007-04-04 19:29:57.000000000 +0200
@@ -813,6 +813,8 @@ enum
 	FS_AIO_NR=18,	/* current system-wide number of aio requests */
 	FS_AIO_MAX_NR=19,	/* system-wide maximum number of aio requests */
 	FS_INOTIFY=20,	/* inotify submenu */
+	FS_NR_USER_MOUNTS=21,	/* int:current number of user mounts */
+	FS_MAX_USER_MOUNTS=22,	/* int:maximum number of user mounts */
 	FS_OCFS2=988,	/* ocfs2 */
 };
 
Index: linux/kernel/sysctl.c
===================================================================
--- linux.orig/kernel/sysctl.c	2007-04-04 19:28:03.000000000 +0200
+++ linux/kernel/sysctl.c	2007-04-04 19:29:57.000000000 +0200
@@ -984,6 +984,22 @@ static ctl_table fs_table[] = {
 #endif	
 #endif
 	{
+		.ctl_name	= FS_NR_USER_MOUNTS,
+		.procname	= "nr_user_mounts",
+		.data		= &nr_user_mounts,
+		.maxlen		= sizeof(int),
+		.mode		= 0444,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
+		.ctl_name	= FS_MAX_USER_MOUNTS,
+		.procname	= "max_user_mounts",
+		.data		= &max_user_mounts,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
 		.ctl_name	= KERN_SETUID_DUMPABLE,
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
Index: linux/Documentation/filesystems/proc.txt
===================================================================
--- linux.orig/Documentation/filesystems/proc.txt	2007-04-04 19:27:59.000000000 +0200
+++ linux/Documentation/filesystems/proc.txt	2007-04-04 19:29:57.000000000 +0200
@@ -922,6 +922,18 @@ reaches aio-max-nr then io_setup will fa
 raising aio-max-nr does not result in the pre-allocation or re-sizing
 of any kernel data structures.
 
+nr_user_mounts and max_user_mounts
+----------------------------------
+
+These represent the number of "user" mounts and the maximum number of
+"user" mounts respectively.  User mounts may be created by
+unprivileged users.  User mounts may also be created with sysadmin
+privileges on behalf of a user, in which case nr_user_mounts may
+exceed max_user_mounts.
+
+By default max_user_mounts is zero.  If you wish to enable
+unprivileged mounts, set it to to some sane value, (e.g. 1000).
+
 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
 -----------------------------------------------------------
 
Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2007-04-04 19:29:54.000000000 +0200
+++ linux/fs/namespace.c	2007-04-04 19:29:57.000000000 +0200
@@ -39,6 +39,9 @@ static int hash_mask __read_mostly, hash
 static struct kmem_cache *mnt_cache __read_mostly;
 static struct rw_semaphore namespace_sem;
 
+int nr_user_mounts;
+int max_user_mounts;
+
 /* /sys/fs */
 decl_subsys(fs, NULL, NULL);
 EXPORT_SYMBOL_GPL(fs_subsys);
@@ -227,11 +230,30 @@ static struct vfsmount *skip_mnt_tree(st
 	return p;
 }
 
+static void dec_nr_user_mounts(void)
+{
+	spin_lock(&vfsmount_lock);
+	nr_user_mounts--;
+	spin_unlock(&vfsmount_lock);
+}
+
 static void set_mnt_user(struct vfsmount *mnt)
 {
 	BUG_ON(mnt->mnt_flags & MNT_USER);
 	mnt->mnt_uid = current->uid;
 	mnt->mnt_flags |= MNT_USER;
+	spin_lock(&vfsmount_lock);
+	nr_user_mounts++;
+	spin_unlock(&vfsmount_lock);
+}
+
+static void clear_mnt_user(struct vfsmount *mnt)
+{
+	if (mnt->mnt_flags & MNT_USER) {
+		mnt->mnt_uid = 0;
+		mnt->mnt_flags &= ~MNT_USER;
+		dec_nr_user_mounts();
+	}
 }
 
 static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root,
@@ -283,6 +305,7 @@ static inline void __mntput(struct vfsmo
 {
 	struct super_block *sb = mnt->mnt_sb;
 	dput(mnt->mnt_root);
+	clear_mnt_user(mnt);
 	free_vfsmnt(mnt);
 	deactivate_super(sb);
 }
@@ -1004,6 +1027,7 @@ static int do_remount(struct nameidata *
 	down_write(&sb->s_umount);
 	err = do_remount_sb(sb, flags, data, 0);
 	if (!err) {
+		clear_mnt_user(nd->mnt);
 		nd->mnt->mnt_flags = mnt_flags;
 		if (flags & MS_SETUSER)
 			set_mnt_user(nd->mnt);
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2007-04-04 19:29:47.000000000 +0200
+++ linux/include/linux/fs.h	2007-04-04 19:29:57.000000000 +0200
@@ -49,6 +49,9 @@ extern struct inodes_stat_t inodes_stat;
 
 extern int leases_enable, lease_break_time;
 
+extern int nr_user_mounts;
+extern int max_user_mounts;
+
 #ifdef CONFIG_DNOTIFY
 extern int dir_notify_enable;
 #endif

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch 4/8] propagate error values from clone_mnt
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
                   ` (2 preceding siblings ...)
  2007-04-04 18:30 ` [patch 3/8] account user mounts Miklos Szeredi
@ 2007-04-04 18:30 ` Miklos Szeredi
  2007-04-04 18:30 ` [patch 5/8] allow unprivileged bind mounts Miklos Szeredi
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

[-- Attachment #1: clone_return_errno.patch --]
[-- Type: text/plain, Size: 5150 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Allow clone_mnt() to return errors other than ENOMEM.  This will be
used for returning a different error value when the number of user
mounts goes over the limit.

Fix copy_tree() to return EPERM for unbindable mounts.

Don't propagate further from dup_mnt_ns() as that copy_tree() can only
fail with -ENOMEM.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2007-04-04 19:29:57.000000000 +0200
+++ linux/fs/namespace.c	2007-04-04 19:30:00.000000000 +0200
@@ -261,42 +261,42 @@ static struct vfsmount *clone_mnt(struct
 {
 	struct super_block *sb = old->mnt_sb;
 	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
+	if (!mnt)
+		return ERR_PTR(-ENOMEM);
 
-	if (mnt) {
-		mnt->mnt_flags = old->mnt_flags;
-		atomic_inc(&sb->s_active);
-		mnt->mnt_sb = sb;
-		mnt->mnt_root = dget(root);
-		mnt->mnt_mountpoint = mnt->mnt_root;
-		mnt->mnt_parent = mnt;
-
-		/* don't copy the MNT_USER flag */
-		mnt->mnt_flags &= ~MNT_USER;
-		if (flag & CL_SETUSER)
-			set_mnt_user(mnt);
-
-		if (flag & CL_SLAVE) {
-			list_add(&mnt->mnt_slave, &old->mnt_slave_list);
-			mnt->mnt_master = old;
-			CLEAR_MNT_SHARED(mnt);
-		} else {
-			if ((flag & CL_PROPAGATION) || IS_MNT_SHARED(old))
-				list_add(&mnt->mnt_share, &old->mnt_share);
-			if (IS_MNT_SLAVE(old))
-				list_add(&mnt->mnt_slave, &old->mnt_slave);
-			mnt->mnt_master = old->mnt_master;
-		}
-		if (flag & CL_MAKE_SHARED)
-			set_mnt_shared(mnt);
+	mnt->mnt_flags = old->mnt_flags;
+	atomic_inc(&sb->s_active);
+	mnt->mnt_sb = sb;
+	mnt->mnt_root = dget(root);
+	mnt->mnt_mountpoint = mnt->mnt_root;
+	mnt->mnt_parent = mnt;
+
+	/* don't copy the MNT_USER flag */
+	mnt->mnt_flags &= ~MNT_USER;
+	if (flag & CL_SETUSER)
+		set_mnt_user(mnt);
 
-		/* stick the duplicate mount on the same expiry list
-		 * as the original if that was on one */
-		if (flag & CL_EXPIRE) {
-			spin_lock(&vfsmount_lock);
-			if (!list_empty(&old->mnt_expire))
-				list_add(&mnt->mnt_expire, &old->mnt_expire);
-			spin_unlock(&vfsmount_lock);
-		}
+	if (flag & CL_SLAVE) {
+		list_add(&mnt->mnt_slave, &old->mnt_slave_list);
+		mnt->mnt_master = old;
+		CLEAR_MNT_SHARED(mnt);
+	} else {
+		if ((flag & CL_PROPAGATION) || IS_MNT_SHARED(old))
+			list_add(&mnt->mnt_share, &old->mnt_share);
+		if (IS_MNT_SLAVE(old))
+			list_add(&mnt->mnt_slave, &old->mnt_slave);
+		mnt->mnt_master = old->mnt_master;
+	}
+	if (flag & CL_MAKE_SHARED)
+		set_mnt_shared(mnt);
+
+	/* stick the duplicate mount on the same expiry list
+	 * as the original if that was on one */
+	if (flag & CL_EXPIRE) {
+		spin_lock(&vfsmount_lock);
+		if (!list_empty(&old->mnt_expire))
+			list_add(&mnt->mnt_expire, &old->mnt_expire);
+		spin_unlock(&vfsmount_lock);
 	}
 	return mnt;
 }
@@ -762,11 +762,11 @@ struct vfsmount *copy_tree(struct vfsmou
 	struct nameidata nd;
 
 	if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt))
-		return NULL;
+		return ERR_PTR(-EPERM);
 
 	res = q = clone_mnt(mnt, dentry, flag);
-	if (!q)
-		goto Enomem;
+	if (IS_ERR(q))
+		goto error;
 	q->mnt_mountpoint = mnt->mnt_mountpoint;
 
 	p = mnt;
@@ -787,8 +787,8 @@ struct vfsmount *copy_tree(struct vfsmou
 			nd.mnt = q;
 			nd.dentry = p->mnt_mountpoint;
 			q = clone_mnt(p, p->mnt_root, flag);
-			if (!q)
-				goto Enomem;
+			if (IS_ERR(q))
+				goto error;
 			spin_lock(&vfsmount_lock);
 			list_add_tail(&q->mnt_list, &res->mnt_list);
 			attach_mnt(q, &nd);
@@ -796,7 +796,7 @@ struct vfsmount *copy_tree(struct vfsmou
 		}
 	}
 	return res;
-Enomem:
+ error:
 	if (res) {
 		LIST_HEAD(umount_list);
 		spin_lock(&vfsmount_lock);
@@ -804,7 +804,7 @@ Enomem:
 		spin_unlock(&vfsmount_lock);
 		release_mounts(&umount_list);
 	}
-	return NULL;
+	return q;
 }
 
 /*
@@ -980,13 +980,13 @@ static int do_loopback(struct nameidata 
 		goto out;
 
 	clone_flags = (flags & MS_SETUSER) ? CL_SETUSER : 0;
-	err = -ENOMEM;
 	if (flags & MS_REC)
 		mnt = copy_tree(old_nd.mnt, old_nd.dentry, clone_flags);
 	else
 		mnt = clone_mnt(old_nd.mnt, old_nd.dentry, clone_flags);
 
-	if (!mnt)
+	err = PTR_ERR(mnt);
+	if (IS_ERR(mnt))
 		goto out;
 
 	err = graft_tree(mnt, nd);
@@ -1532,7 +1532,7 @@ struct mnt_namespace *dup_mnt_ns(struct 
 	/* First pass: copy the tree topology */
 	new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root,
 					CL_COPY_ALL | CL_EXPIRE);
-	if (!new_ns->root) {
+	if (IS_ERR(new_ns->root)) {
 		up_write(&namespace_sem);
 		kfree(new_ns);
 		return NULL;
Index: linux/fs/pnode.c
===================================================================
--- linux.orig/fs/pnode.c	2007-04-04 19:27:47.000000000 +0200
+++ linux/fs/pnode.c	2007-04-04 19:30:00.000000000 +0200
@@ -187,8 +187,9 @@ int propagate_mnt(struct vfsmount *dest_
 
 		source =  get_source(m, prev_dest_mnt, prev_src_mnt, &type);
 
-		if (!(child = copy_tree(source, source->mnt_root, type))) {
-			ret = -ENOMEM;
+		child = copy_tree(source, source->mnt_root, type);
+		if (IS_ERR(child)) {
+			ret = PTR_ERR(child);
 			list_splice(tree_list, tmp_list.prev);
 			goto out;
 		}

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch 5/8] allow unprivileged bind mounts
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
                   ` (3 preceding siblings ...)
  2007-04-04 18:30 ` [patch 4/8] propagate error values from clone_mnt Miklos Szeredi
@ 2007-04-04 18:30 ` Miklos Szeredi
  2007-04-04 18:30 ` [patch 6/8] put declaration of put_filesystem() in fs.h Miklos Szeredi
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

[-- Attachment #1: unprivileged_bind_mount.patch --]
[-- Type: text/plain, Size: 3610 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Allow bind mounts to unprivileged users if the following conditions
are met:

  - mountpoint is not a symlink or special file
  - mountpoint is not a sticky directory or is owned by the current user
  - mountpoint is writable by user
  - the number of user mounts is below the maximum

Unprivileged mounts imply MS_SETUSER, and will also have the "nosuid"
and "nodev" mount flags set.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2007-04-04 19:30:00.000000000 +0200
+++ linux/fs/namespace.c	2007-04-04 19:30:02.000000000 +0200
@@ -237,11 +237,30 @@ static void dec_nr_user_mounts(void)
 	spin_unlock(&vfsmount_lock);
 }
 
-static void set_mnt_user(struct vfsmount *mnt)
+static int reserve_user_mount(void)
+{
+	int err = 0;
+	spin_lock(&vfsmount_lock);
+	if (nr_user_mounts >= max_user_mounts && !capable(CAP_SYS_ADMIN))
+		err = -EPERM;
+	else
+		nr_user_mounts++;
+	spin_unlock(&vfsmount_lock);
+	return err;
+}
+
+static void __set_mnt_user(struct vfsmount *mnt)
 {
 	BUG_ON(mnt->mnt_flags & MNT_USER);
 	mnt->mnt_uid = current->uid;
 	mnt->mnt_flags |= MNT_USER;
+	if (!capable(CAP_SYS_ADMIN))
+		mnt->mnt_flags |= MNT_NOSUID | MNT_NODEV;
+}
+
+static void set_mnt_user(struct vfsmount *mnt)
+{
+	__set_mnt_user(mnt);
 	spin_lock(&vfsmount_lock);
 	nr_user_mounts++;
 	spin_unlock(&vfsmount_lock);
@@ -260,9 +279,16 @@ static struct vfsmount *clone_mnt(struct
 					int flag)
 {
 	struct super_block *sb = old->mnt_sb;
-	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
+	struct vfsmount *mnt;
+
+	if (flag & CL_SETUSER) {
+		int err = reserve_user_mount();
+		if (err)
+			return ERR_PTR(err);
+	}
+	mnt = alloc_vfsmnt(old->mnt_devname);
 	if (!mnt)
-		return ERR_PTR(-ENOMEM);
+		goto alloc_failed;
 
 	mnt->mnt_flags = old->mnt_flags;
 	atomic_inc(&sb->s_active);
@@ -274,7 +300,7 @@ static struct vfsmount *clone_mnt(struct
 	/* don't copy the MNT_USER flag */
 	mnt->mnt_flags &= ~MNT_USER;
 	if (flag & CL_SETUSER)
-		set_mnt_user(mnt);
+		__set_mnt_user(mnt);
 
 	if (flag & CL_SLAVE) {
 		list_add(&mnt->mnt_slave, &old->mnt_slave_list);
@@ -299,6 +325,11 @@ static struct vfsmount *clone_mnt(struct
 		spin_unlock(&vfsmount_lock);
 	}
 	return mnt;
+
+ alloc_failed:
+	if (flag & CL_SETUSER)
+		dec_nr_user_mounts();
+	return ERR_PTR(-ENOMEM);
 }
 
 static inline void __mntput(struct vfsmount *mnt)
@@ -726,22 +757,23 @@ asmlinkage long sys_oldumount(char __use
 
 #endif
 
-static int mount_is_safe(struct nameidata *nd)
+static int mount_is_safe(struct nameidata *nd, int *flags)
 {
 	if (capable(CAP_SYS_ADMIN))
 		return 0;
-	return -EPERM;
-#ifdef notyet
-	if (S_ISLNK(nd->dentry->d_inode->i_mode))
+
+	if (!S_ISDIR(nd->dentry->d_inode->i_mode) &&
+	    !S_ISREG(nd->dentry->d_inode->i_mode))
 		return -EPERM;
 	if (nd->dentry->d_inode->i_mode & S_ISVTX) {
-		if (current->uid != nd->dentry->d_inode->i_uid)
+		if (current->fsuid != nd->dentry->d_inode->i_uid)
 			return -EPERM;
 	}
-	if (vfs_permission(nd, MAY_WRITE))
+	if (vfs_permission(nd, MAY_WRITE) || IS_APPEND(nd->dentry->d_inode))
 		return -EPERM;
+
+	*flags |= MS_SETUSER;
 	return 0;
-#endif
 }
 
 static int lives_below_in_same_fs(struct dentry *d, struct dentry *dentry)
@@ -962,7 +994,7 @@ static int do_loopback(struct nameidata 
 	int clone_flags;
 	struct nameidata old_nd;
 	struct vfsmount *mnt = NULL;
-	int err = mount_is_safe(nd);
+	int err = mount_is_safe(nd, &flags);
 	if (err)
 		return err;
 	if (!old_name || !*old_name)

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch 6/8] put declaration of put_filesystem() in fs.h
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
                   ` (4 preceding siblings ...)
  2007-04-04 18:30 ` [patch 5/8] allow unprivileged bind mounts Miklos Szeredi
@ 2007-04-04 18:30 ` Miklos Szeredi
  2007-04-04 18:30 ` [patch 7/8] allow unprivileged mounts Miklos Szeredi
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

[-- Attachment #1: put_filesystem_in_header.patch --]
[-- Type: text/plain, Size: 1282 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Declarations go into headers.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c	2007-04-04 19:29:41.000000000 +0200
+++ linux/fs/super.c	2007-04-04 19:30:05.000000000 +0200
@@ -40,10 +40,6 @@
 #include <asm/uaccess.h>
 
 
-void get_filesystem(struct file_system_type *fs);
-void put_filesystem(struct file_system_type *fs);
-struct file_system_type *get_fs_type(const char *name);
-
 LIST_HEAD(super_blocks);
 DEFINE_SPINLOCK(sb_lock);
 
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2007-04-04 19:29:57.000000000 +0200
+++ linux/include/linux/fs.h	2007-04-04 19:30:05.000000000 +0200
@@ -1858,6 +1858,8 @@ extern int vfs_fstat(unsigned int, struc
 
 extern int vfs_ioctl(struct file *, unsigned int, unsigned int, unsigned long);
 
+extern void get_filesystem(struct file_system_type *fs);
+extern void put_filesystem(struct file_system_type *fs);
 extern struct file_system_type *get_fs_type(const char *name);
 extern struct super_block *get_super(struct block_device *);
 extern struct super_block *user_get_super(dev_t);

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch 7/8] allow unprivileged mounts
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
                   ` (5 preceding siblings ...)
  2007-04-04 18:30 ` [patch 6/8] put declaration of put_filesystem() in fs.h Miklos Szeredi
@ 2007-04-04 18:30 ` Miklos Szeredi
  2007-04-04 18:30 ` [patch 8/8] allow unprivileged fuse mounts Miklos Szeredi
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

[-- Attachment #1: unprivileged_mount.patch --]
[-- Type: text/plain, Size: 3519 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Define a new fs flag FS_SAFE, which denotes, that unprivileged
mounting of this filesystem may not constitute a security problem.

Since most filesystems haven't been designed with unprivileged
mounting in mind, a thorough audit is needed before setting this flag.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c	2007-04-04 19:30:02.000000000 +0200
+++ linux/fs/namespace.c	2007-04-04 19:30:08.000000000 +0200
@@ -757,11 +757,15 @@ asmlinkage long sys_oldumount(char __use
 
 #endif
 
-static int mount_is_safe(struct nameidata *nd, int *flags)
+static int mount_is_safe(struct nameidata *nd, struct file_system_type *type,
+			 int *flags)
 {
 	if (capable(CAP_SYS_ADMIN))
 		return 0;
 
+	if (type && !(type->fs_flags & FS_SAFE))
+		return -EPERM;
+
 	if (!S_ISDIR(nd->dentry->d_inode->i_mode) &&
 	    !S_ISREG(nd->dentry->d_inode->i_mode))
 		return -EPERM;
@@ -994,7 +998,7 @@ static int do_loopback(struct nameidata 
 	int clone_flags;
 	struct nameidata old_nd;
 	struct vfsmount *mnt = NULL;
-	int err = mount_is_safe(nd, &flags);
+	int err = mount_is_safe(nd, NULL, &flags);
 	if (err)
 		return err;
 	if (!old_name || !*old_name)
@@ -1156,26 +1160,46 @@ out:
  * create a new mount for userspace and request it to be added into the
  * namespace's tree
  */
-static int do_new_mount(struct nameidata *nd, char *type, int flags,
+static int do_new_mount(struct nameidata *nd, char *fstype, int flags,
 			int mnt_flags, char *name, void *data)
 {
+	int err;
 	struct vfsmount *mnt;
+	struct file_system_type *type;
 
-	if (!type || !memchr(type, 0, PAGE_SIZE))
+	if (!fstype || !memchr(fstype, 0, PAGE_SIZE))
 		return -EINVAL;
 
-	/* we need capabilities... */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
+	type = get_fs_type(fstype);
+	if (!type)
+		return -ENODEV;
 
-	mnt = do_kern_mount(type, flags & ~MS_SETUSER, name, data);
-	if (IS_ERR(mnt))
+	err = mount_is_safe(nd, type, &flags);
+	if (err)
+		goto out_put_filesystem;
+
+	if (flags & MS_SETUSER) {
+		err = reserve_user_mount();
+		if (err)
+			goto out_put_filesystem;
+	}
+
+	mnt = vfs_kern_mount(type, flags & ~MS_SETUSER, name, data);
+	put_filesystem(type);
+	if (IS_ERR(mnt)) {
+		if (flags & MS_SETUSER)
+			dec_nr_user_mounts();
 		return PTR_ERR(mnt);
+	}
 
 	if (flags & MS_SETUSER)
-		set_mnt_user(mnt);
+		__set_mnt_user(mnt);
 
 	return do_add_mount(mnt, nd, mnt_flags, NULL);
+
+ out_put_filesystem:
+	put_filesystem(type);
+	return err;
 }
 
 /*
@@ -1205,7 +1229,7 @@ int do_add_mount(struct vfsmount *newmnt
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
-	/* MNT_USER was set earlier */
+	/* some flags may have been set earlier */
 	newmnt->mnt_flags |= mnt_flags;
 	if ((err = graft_tree(newmnt, nd)))
 		goto unlock;
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2007-04-04 19:30:05.000000000 +0200
+++ linux/include/linux/fs.h	2007-04-04 19:30:08.000000000 +0200
@@ -95,6 +95,7 @@ extern int dir_notify_enable;
 #define FS_REQUIRES_DEV 1 
 #define FS_BINARY_MOUNTDATA 2
 #define FS_HAS_SUBTYPE 4
+#define FS_SAFE 8		/* Safe to mount by unprivileged users */
 #define FS_REVAL_DOT	16384	/* Check the paths ".", ".." for staleness */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move()
 					 * during rename() internally.

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch 8/8] allow unprivileged fuse mounts
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
                   ` (6 preceding siblings ...)
  2007-04-04 18:30 ` [patch 7/8] allow unprivileged mounts Miklos Szeredi
@ 2007-04-04 18:30 ` Miklos Szeredi
  2007-04-09 18:57 ` [patch 0/8] unprivileged mount syscall Serge E. Hallyn
       [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
  9 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng

[-- Attachment #1: fuse_safe.patch --]
[-- Type: text/plain, Size: 2469 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Use FS_SAFE for "fuse" fs type, but not for "fuseblk".

FUSE was designed from the beginning to be safe for unprivileged
users.  This has also been verified in practice over many years.  The
sysadmin still needs to set "fs.max_user_mounts" sysctl variable to a
non-zero value to enable unprivileged mounts.

This will enable future installations to remove the suid-root
fusermount utility.

Don't require the "user_id=" and "group_id=" options for unprivileged
mounts, but if they are present verify them for sanity.

Disallow the "allow_other" option for unprivileged mounts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/fs/fuse/inode.c
===================================================================
--- linux.orig/fs/fuse/inode.c	2007-04-04 19:29:44.000000000 +0200
+++ linux/fs/fuse/inode.c	2007-04-04 19:30:11.000000000 +0200
@@ -311,6 +311,19 @@ static int parse_fuse_opt(char *opt, str
 	d->max_read = ~0;
 	d->blksize = 512;
 
+	/*
+	 * For unprivileged mounts use current uid/gid.  Still allow
+	 * "user_id" and "group_id" options for compatibility, but
+	 * only if they match these values.
+	 */
+	if (!capable(CAP_SYS_ADMIN)) {
+		d->user_id = current->uid;
+		d->user_id_present = 1;
+		d->group_id = current->gid;
+		d->group_id_present = 1;
+
+	}
+
 	while ((p = strsep(&opt, ",")) != NULL) {
 		int token;
 		int value;
@@ -339,6 +352,8 @@ static int parse_fuse_opt(char *opt, str
 		case OPT_USER_ID:
 			if (match_int(&args[0], &value))
 				return 0;
+			if (d->user_id_present && d->user_id != value)
+				return 0;
 			d->user_id = value;
 			d->user_id_present = 1;
 			break;
@@ -346,6 +361,8 @@ static int parse_fuse_opt(char *opt, str
 		case OPT_GROUP_ID:
 			if (match_int(&args[0], &value))
 				return 0;
+			if (d->group_id_present && d->group_id != value)
+				return 0;
 			d->group_id = value;
 			d->group_id_present = 1;
 			break;
@@ -536,6 +553,10 @@ static int fuse_fill_super(struct super_
 	if (!parse_fuse_opt((char *) data, &d, is_bdev))
 		return -EINVAL;
 
+	/* This is a privileged option */
+	if ((d.flags & FUSE_ALLOW_OTHER) && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	if (is_bdev) {
 #ifdef CONFIG_BLOCK
 		if (!sb_set_blocksize(sb, d.blksize))
@@ -639,6 +660,7 @@ static struct file_system_type fuse_fs_t
 	.fs_flags	= FS_HAS_SUBTYPE,
 	.get_sb		= fuse_get_sb,
 	.kill_sb	= kill_anon_super,
+	.fs_flags	= FS_SAFE,
 };
 
 #ifdef CONFIG_BLOCK

--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
                   ` (7 preceding siblings ...)
  2007-04-04 18:30 ` [patch 8/8] allow unprivileged fuse mounts Miklos Szeredi
@ 2007-04-09 18:57 ` Serge E. Hallyn
  2007-04-09 20:14   ` Miklos Szeredi
       [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
  9 siblings, 1 reply; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-09 18:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-fsdevel, util-linux-ng

Quoting Miklos Szeredi (miklos@szeredi.hu):
> This patchset adds support for keeping mount ownership information in
> the kernel, and allow unprivileged mount(2) and umount(2) in certain
> cases.
> 
> This can be useful for the following reasons:
> 
> - mount(8) can store ownership ("user=XY" option) in the kernel
>   instead, or in addition to storing it in /etc/mtab.  For example if
>   private namespaces are used with mount propagations /etc/mtab
>   becomes unworkable, but using /proc/mounts works fine
> 
> - fuse won't need a special suid-root mount/umount utility.  Plain
>   umount(8) can easily be made to work with unprivileged fuse mounts
> 
> - users can use bind mounts without having to pre-configure them in
>   /etc/fstab
> 
> All this is done in a secure way, and unprivileged bind and fuse
> mounts are disabled by default and can be enabled through sysctl or
> /proc/sys.
> 
> One thing that is missing from this series is the ability to restrict
> user mounts to private namespaces.  The reason is that private
> namespaces have still not gained the momentum and support needed for
> painless user experience.  So such a feature would not yet get enough
> attention and testing.  However adding such an optional restriction
> can be done with minimal changes in the future, once private
> namespaces have matured.

What is the main reason for that feature?  Would it be to prevent things
like login from being tricked by user mounts?  Isn't it sufficient, in
fact, better, to require that the target of the mount be owned by the
user doing the mount?

-serge   (who's pretty sure he's missing something)

> An earlier version of these patches have been discussed here:
> 
>   http://lkml.org/lkml/2005/5/3/64
> 
> --
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 18:57 ` [patch 0/8] unprivileged mount syscall Serge E. Hallyn
@ 2007-04-09 20:14   ` Miklos Szeredi
  2007-04-09 20:55     ` Serge E. Hallyn
  0 siblings, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-09 20:14 UTC (permalink / raw)
  To: serue; +Cc: akpm, linux-fsdevel, util-linux-ng

> > One thing that is missing from this series is the ability to restrict
> > user mounts to private namespaces.  The reason is that private
> > namespaces have still not gained the momentum and support needed for
> > painless user experience.  So such a feature would not yet get enough
> > attention and testing.  However adding such an optional restriction
> > can be done with minimal changes in the future, once private
> > namespaces have matured.
> 
> What is the main reason for that feature?  Would it be to prevent things
> like login from being tricked by user mounts?  Isn't it sufficient, in
> fact, better, to require that the target of the mount be owned by the
> user doing the mount?

It's been discussed later in that thread.  Basically you can fool a
lot of system programs (like backup) with mounting/binding in the
global namespace.  Restricting the destination doesn't always help.

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 20:14   ` Miklos Szeredi
@ 2007-04-09 20:55     ` Serge E. Hallyn
       [not found]       ` <20070409205506.GC20226-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-09 20:55 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-fsdevel, util-linux-ng

Quoting Miklos Szeredi (miklos@szeredi.hu):
> > > One thing that is missing from this series is the ability to restrict
> > > user mounts to private namespaces.  The reason is that private
> > > namespaces have still not gained the momentum and support needed for
> > > painless user experience.  So such a feature would not yet get enough
> > > attention and testing.  However adding such an optional restriction
> > > can be done with minimal changes in the future, once private
> > > namespaces have matured.
> > 
> > What is the main reason for that feature?  Would it be to prevent things
> > like login from being tricked by user mounts?  Isn't it sufficient, in
> > fact, better, to require that the target of the mount be owned by the
> > user doing the mount?
> 
> It's been discussed later in that thread.  Basically you can fool a

I see now, sorry.

> lot of system programs (like backup) with mounting/binding in the
> global namespace.  Restricting the destination doesn't always help.
> 
> Miklos

It would be nice in general if we could avoid any sort of checks for
(mnt->mnt_ns == init_nsproxy.mnt_ns).  Maybe that won't be possible,
but, taking the two listed examples:

1. mount --bind / ~/bindns;  (later) userdel hallyn

I assume userdel does a simple stupid rm -rf without first umounting,
then?  So (1) it seems wise to have userdel umount anything under ~user
first anyway, and (2) if $USER does a mount --bind from a source he
doesn't own, should we make the resulting mount read-only?  (realizing
the read-only bind mount patches are still under development :)  Or is
that overly restrictive somehow for fuse?

2. backups

Is this just a 'he's going to fill up the whole disk' issue?  Frankly,
it seems wise to have cron or whatever is spawning the backup start in
it's own namespace right at boot.  Generally when I think back on sites
where I've dealt with backup, backups were done on a separate server
which didn't allow userlogins anyway, so it wouldn't be a problem.  But
I'm sure that's a limited (==erroneous) POV.

I do realize that the whole problem about corner cases isn't addressing
two little ones, but the fact that there are more we haven't thought of.
So are there any currently known use cases where requiring a CLONE_NEWNS
before user mounts is unacceptable?

thanks,
-serge

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <20070409205506.GC20226-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]       ` <20070409205506.GC20226-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
@ 2007-04-11 19:43         ` Miklos Szeredi
       [not found]           ` <E1Hbiih-00060L-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-11 19:43 UTC (permalink / raw)
  To: serue-r/Jw6+rmf7HQT0dZR+AlfA
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA

> It would be nice in general if we could avoid any sort of checks for
> (mnt->mnt_ns == init_nsproxy.mnt_ns).  Maybe that won't be possible,
> but, taking the two listed examples:

[snip]

It's probably worthwile going after these problematic cases, and
fixing them, OTOH it's not easy to audit a complete system for holes
arising from user mounts in the global namespace.

So why not move this decision out from the kernel?  How about adding a
boolean flag to namespaces, which specifies whether unprivileged
mounts are allowed or not.  This would give complete flexibility to
distro builders and sysadmins.

The biggest problem I see is how to set this flag.  There's no easy
way to represent namespaces in /proc or /sys, and this is sufficiently
obscure not to warrant a new syscall.  Adding a new flag to prctl()
could do the trick.  Does that sound OK?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <E1Hbiih-00060L-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]           ` <E1Hbiih-00060L-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
@ 2007-04-11 20:05             ` Serge E. Hallyn
  2007-04-11 20:41               ` Miklos Szeredi
  0 siblings, 1 reply; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-11 20:05 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA

Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org):
> > It would be nice in general if we could avoid any sort of checks for
> > (mnt->mnt_ns == init_nsproxy.mnt_ns).  Maybe that won't be possible,
> > but, taking the two listed examples:
> 
> [snip]
> 
> It's probably worthwile going after these problematic cases, and
> fixing them, OTOH it's not easy to audit a complete system for holes
> arising from user mounts in the global namespace.
> 
> So why not move this decision out from the kernel?  How about adding a
> boolean flag to namespaces, which specifies whether unprivileged
> mounts are allowed or not.  This would give complete flexibility to
> distro builders and sysadmins.
> 
> The biggest problem I see is how to set this flag.  There's no easy
> way to represent namespaces in /proc or /sys, and this is sufficiently
> obscure not to warrant a new syscall.  Adding a new flag to prctl()
> could do the trick.  Does that sound OK?

Not objecting to prctl(), but two other options would be

	1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is
	   the time at which the ns is created, so in that sense it
	   makes sense.
	2. use the nsproxy container subsystem (see Paul Menage's
	   containers patchset) to set this using, e.g.,

	   	echo 1 > /containers/vserver1/mounts/usermount

The prctl() method has a huge advantage of being implementable right
now.

-serge

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-11 20:05             ` Serge E. Hallyn
@ 2007-04-11 20:41               ` Miklos Szeredi
  2007-04-11 20:57                 ` Serge E. Hallyn
  0 siblings, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-11 20:41 UTC (permalink / raw)
  To: serue; +Cc: akpm, linux-fsdevel, util-linux-ng

> Not objecting to prctl(), but two other options would be
> 
> 	1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is
> 	   the time at which the ns is created, so in that sense it
> 	   makes sense.

Yes, I thought about this, but there's no easy way to set the flag for
the initial namespace, and a second flag CLONE_NEW_NS_NOUSERMNT would
be needed to turn off the flag.

> 	2. use the nsproxy container subsystem (see Paul Menage's
> 	   containers patchset) to set this using, e.g.,
> 
> 	   	echo 1 > /containers/vserver1/mounts/usermount

That again would lose some flexibility: only namespaces which
are part of a container could be manipulated.  Does that exclude the
initial namespace?

Also how would a process find out which vserver it is running in?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-11 20:41               ` Miklos Szeredi
@ 2007-04-11 20:57                 ` Serge E. Hallyn
  0 siblings, 0 replies; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-11 20:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-fsdevel, util-linux-ng

Quoting Miklos Szeredi (miklos@szeredi.hu):
> > Not objecting to prctl(), but two other options would be
> > 
> > 	1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is
> > 	   the time at which the ns is created, so in that sense it
> > 	   makes sense.
> 
> Yes, I thought about this, but there's no easy way to set the flag for
> the initial namespace, and a second flag CLONE_NEW_NS_NOUSERMNT would
> be needed to turn off the flag.

Not mentioning it would 'turn it off' for the cloned ns, but the default
value for the initial namespace is still a problem.

> > 	2. use the nsproxy container subsystem (see Paul Menage's
> > 	   containers patchset) to set this using, e.g.,
> > 
> > 	   	echo 1 > /containers/vserver1/mounts/usermount
> 
> That again would lose some flexibility: only namespaces which
> are part of a container could be manipulated.

In the nsproxy subsystem, every namespace gets a container so
long as the nsproxy subsystem is mounted.

> Does that exclude the
> initial namespace?

No, the initial namespace is tied to the root dentry - so if as my
example was assuming youve done

	mount -t container -o ns none /containers

then to change the setting for the initial namespace you would

	echo 0 > /containers/mounts/usermount

> Also how would a process find out which vserver it is running in?

cat /proc/$$/container

-serge

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
@ 2007-04-06 23:02   ` Andrew Morton
  2007-04-06 23:16     ` H. Peter Anvin
  2007-04-07  6:41     ` Miklos Szeredi
  2007-04-09 22:00   ` Serge E. Hallyn
  1 sibling, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2007-04-06 23:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, 04 Apr 2007 20:30:12 +0200 Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> wrote:

> This patchset adds support for keeping mount ownership information in
> the kernel, and allow unprivileged mount(2) and umount(2) in certain
> cases.

No replies, huh?

My knowledge of the code which you're touching is not strong, and my spare
reviewing capacity is not high.  And this work does need close review by
people who are familar with the code which you're changing.

So could I suggest that you go for a dig through the git history, identify
some individuals who look like they know this code, then do a resend,
cc'ing those people?  Please also cc linux-kernel on that resend.

> This can be useful for the following reasons:
> 
> - mount(8) can store ownership ("user=XY" option) in the kernel
>   instead, or in addition to storing it in /etc/mtab.  For example if
>   private namespaces are used with mount propagations /etc/mtab
>   becomes unworkable, but using /proc/mounts works fine
> 
> - fuse won't need a special suid-root mount/umount utility.  Plain
>   umount(8) can easily be made to work with unprivileged fuse mounts
> 
> - users can use bind mounts without having to pre-configure them in
>   /etc/fstab
> 
> All this is done in a secure way, and unprivileged bind and fuse
> mounts are disabled by default and can be enabled through sysctl or
> /proc/sys.
> 
> One thing that is missing from this series is the ability to restrict
> user mounts to private namespaces.  The reason is that private
> namespaces have still not gained the momentum and support needed for
> painless user experience.  So such a feature would not yet get enough
> attention and testing.  However adding such an optional restriction
> can be done with minimal changes in the future, once private
> namespaces have matured.

I suspect the people who developed and maintain nsproxy would disagree ;)

Please also cc containers-qjLDD68F18NYIhldQZh9Cg@public.gmane.org

> An earlier version of these patches have been discussed here:
> 
>   http://lkml.org/lkml/2005/5/3/64
> 
> --

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-06 23:02   ` Andrew Morton
@ 2007-04-06 23:16     ` H. Peter Anvin
  2007-04-06 23:55       ` Jan Engelhardt
  2007-04-10  8:52       ` Ian Kent
  2007-04-07  6:41     ` Miklos Szeredi
  1 sibling, 2 replies; 54+ messages in thread
From: H. Peter Anvin @ 2007-04-06 23:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Miklos Szeredi, linux-fsdevel, util-linux-ng, containers,
	linux-kernel

>>
>> - users can use bind mounts without having to pre-configure them in
>>   /etc/fstab
>>

This is by far the biggest concern I see.  I think the security 
implication of allowing anyone to do bind mounts are poorly understood.

	-hpa

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-06 23:16     ` H. Peter Anvin
@ 2007-04-06 23:55       ` Jan Engelhardt
  2007-04-07  0:22         ` H. Peter Anvin
  2007-04-10  8:52       ` Ian Kent
  1 sibling, 1 reply; 54+ messages in thread
From: Jan Engelhardt @ 2007-04-06 23:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Miklos Szeredi, linux-fsdevel, util-linux-ng,
	containers, linux-kernel


On Apr 6 2007 16:16, H. Peter Anvin wrote:
>> > 
>> > - users can use bind mounts without having to pre-configure them in
>> > /etc/fstab
>> > 
>
> This is by far the biggest concern I see.  I think the security implication of
> allowing anyone to do bind mounts are poorly understood.

$ whoami
miklos
$ mount --bind / ~/down_under

later that day:
# userdel -r miklos

So both the source (/) and target (~/down_under) directory must be owned 
by the user before --bind may succeed.

There may be other implications hpa might want to fill us in.

Regards,
Jan
-- 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-06 23:55       ` Jan Engelhardt
@ 2007-04-07  0:22         ` H. Peter Anvin
  2007-04-07  3:40           ` Eric Van Hensbergen
  0 siblings, 1 reply; 54+ messages in thread
From: H. Peter Anvin @ 2007-04-07  0:22 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Andrew Morton, Miklos Szeredi, linux-fsdevel, util-linux-ng,
	containers, linux-kernel

Jan Engelhardt wrote:
> On Apr 6 2007 16:16, H. Peter Anvin wrote:
>>>> - users can use bind mounts without having to pre-configure them in
>>>> /etc/fstab
>>>>
>> This is by far the biggest concern I see.  I think the security implication of
>> allowing anyone to do bind mounts are poorly understood.
> 
> $ whoami
> miklos
> $ mount --bind / ~/down_under
> 
> later that day:
> # userdel -r miklos
> 
> So both the source (/) and target (~/down_under) directory must be owned 
> by the user before --bind may succeed.
> 
> There may be other implications hpa might want to fill us in.

Consider backups, for example.

	-hpa

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-07  0:22         ` H. Peter Anvin
@ 2007-04-07  3:40           ` Eric Van Hensbergen
       [not found]             ` <a4e6962a0704062040q12c0013ek9591b9fbb27caa12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Eric Van Hensbergen @ 2007-04-07  3:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jan Engelhardt, Andrew Morton, Miklos Szeredi, linux-fsdevel,
	util-linux-ng, containers, linux-kernel

On 4/6/07, H. Peter Anvin <hpa@zytor.com> wrote:
> Jan Engelhardt wrote:
> > On Apr 6 2007 16:16, H. Peter Anvin wrote:
> >>>> - users can use bind mounts without having to pre-configure them in
> >>>> /etc/fstab
> >>>>
> >> This is by far the biggest concern I see.  I think the security implication of
> >> allowing anyone to do bind mounts are poorly understood.
> >
> > $ whoami
> > miklos
> > $ mount --bind / ~/down_under
> >
> > later that day:
> > # userdel -r miklos
> >
>
> Consider backups, for example.
>

This is the reason why enforcing private namespaces for user mounts
makes sense.  I think it catches many of these corner cases.

          -eric

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <a4e6962a0704062040q12c0013ek9591b9fbb27caa12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]             ` <a4e6962a0704062040q12c0013ek9591b9fbb27caa12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2007-04-07  6:48               ` Miklos Szeredi
  0 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-07  6:48 UTC (permalink / raw)
  To: ericvh-Re5JQEeQqe8AvxtiuMwx3w
  Cc: hpa-YMNOUZJC4hwAvxtiuMwx3w, jengelh-CujU1KeUx2fb/Wh9oZwLjA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

> On 4/6/07, H. Peter Anvin <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> wrote:
> > Jan Engelhardt wrote:
> > > On Apr 6 2007 16:16, H. Peter Anvin wrote:
> > >>>> - users can use bind mounts without having to pre-configure them in
> > >>>> /etc/fstab
> > >>>>
> > >> This is by far the biggest concern I see.  I think the security implication of
> > >> allowing anyone to do bind mounts are poorly understood.
> > >
> > > $ whoami
> > > miklos
> > > $ mount --bind / ~/down_under
> > >
> > > later that day:
> > > # userdel -r miklos
> > >
> >
> > Consider backups, for example.
> >
> 
> This is the reason why enforcing private namespaces for user mounts
> makes sense.  I think it catches many of these corner cases.

Yes, disabling user bind mounts in the global namespace makes sense.

Enabling user fuse mounts in the global namespace still works though,
even if a little cludgy.  All these nasty corner cases have been
thought through and validated by a lot of users.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-06 23:16     ` H. Peter Anvin
  2007-04-06 23:55       ` Jan Engelhardt
@ 2007-04-10  8:52       ` Ian Kent
       [not found]         ` <1176195125.3476.47.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
  1 sibling, 1 reply; 54+ messages in thread
From: Ian Kent @ 2007-04-10  8:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Miklos Szeredi, linux-fsdevel, util-linux-ng,
	containers, linux-kernel

On Fri, 2007-04-06 at 16:16 -0700, H. Peter Anvin wrote:
> >>
> >> - users can use bind mounts without having to pre-configure them in
> >>   /etc/fstab
> >>
> 
> This is by far the biggest concern I see.  I think the security 
> implication of allowing anyone to do bind mounts are poorly understood.

And especially so since there is no way for a filesystem module to veto
such requests.

Ian



^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <1176195125.3476.47.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]         ` <1176195125.3476.47.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
@ 2007-04-11 10:48           ` Miklos Szeredi
  2007-04-11 13:48             ` Ian Kent
  0 siblings, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-11 10:48 UTC (permalink / raw)
  To: raven-PKsaG3nR2I+sTnJN9+BGXg
  Cc: hpa-YMNOUZJC4hwAvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

> > >>
> > >> - users can use bind mounts without having to pre-configure them in
> > >>   /etc/fstab
> > >>
> > 
> > This is by far the biggest concern I see.  I think the security 
> > implication of allowing anyone to do bind mounts are poorly understood.
> 
> And especially so since there is no way for a filesystem module to veto
> such requests.

The filesystem can't veto initial mounts based on destination either.
I don't think it's up to the filesystem to police bind/move mounts in
any way.

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-11 10:48           ` Miklos Szeredi
@ 2007-04-11 13:48             ` Ian Kent
       [not found]               ` <1176299311.3377.6.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Ian Kent @ 2007-04-11 13:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: hpa, akpm, linux-fsdevel, util-linux-ng, containers, linux-kernel

On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote:
> > > >>
> > > >> - users can use bind mounts without having to pre-configure them in
> > > >>   /etc/fstab
> > > >>
> > > 
> > > This is by far the biggest concern I see.  I think the security 
> > > implication of allowing anyone to do bind mounts are poorly understood.
> > 
> > And especially so since there is no way for a filesystem module to veto
> > such requests.
> 
> The filesystem can't veto initial mounts based on destination either.
> I don't think it's up to the filesystem to police bind/move mounts in
> any way.

But if a filesystem can't or the developer thinks that it shouldn't for
some reason, support bind/move mounts then there should be a way for the
filesystem to tell the kernel that.

Surely a filesystem is in a good position to be able to decide if a
mount request "for it" should be allowed to continue based on it's "own
situation and capabilities".

Ian

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <1176299311.3377.6.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]               ` <1176299311.3377.6.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
@ 2007-04-11 14:26                 ` Serge E. Hallyn
       [not found]                   ` <20070411142608.GC30460-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-11 14:26 UTC (permalink / raw)
  To: Ian Kent
  Cc: Miklos Szeredi, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Quoting Ian Kent (raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org):
> On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote:
> > > > >>
> > > > >> - users can use bind mounts without having to pre-configure them in
> > > > >>   /etc/fstab
> > > > >>
> > > > 
> > > > This is by far the biggest concern I see.  I think the security 
> > > > implication of allowing anyone to do bind mounts are poorly understood.
> > > 
> > > And especially so since there is no way for a filesystem module to veto
> > > such requests.
> > 
> > The filesystem can't veto initial mounts based on destination either.
> > I don't think it's up to the filesystem to police bind/move mounts in
> > any way.
> 
> But if a filesystem can't or the developer thinks that it shouldn't for
> some reason, support bind/move mounts then there should be a way for the

Can you list some valid reasons why an fs could care where it is
mounted?  The only thing I could think of is a stackable fs, but it
shouldn't care whether it is overlay-mounted or not.

thanks,
-serge

> filesystem to tell the kernel that.
> 
> Surely a filesystem is in a good position to be able to decide if a
> mount request "for it" should be allowed to continue based on it's "own
> situation and capabilities".
> 
> Ian
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <20070411142608.GC30460-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                   ` <20070411142608.GC30460-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
@ 2007-04-11 14:27                     ` Ian Kent
       [not found]                       ` <1176301632.3377.9.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Ian Kent @ 2007-04-11 14:27 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Miklos Szeredi, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, 2007-04-11 at 09:26 -0500, Serge E. Hallyn wrote:
> Quoting Ian Kent (raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org):
> > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote:
> > > > > >>
> > > > > >> - users can use bind mounts without having to pre-configure them in
> > > > > >>   /etc/fstab
> > > > > >>
> > > > > 
> > > > > This is by far the biggest concern I see.  I think the security 
> > > > > implication of allowing anyone to do bind mounts are poorly understood.
> > > > 
> > > > And especially so since there is no way for a filesystem module to veto
> > > > such requests.
> > > 
> > > The filesystem can't veto initial mounts based on destination either.
> > > I don't think it's up to the filesystem to police bind/move mounts in
> > > any way.
> > 
> > But if a filesystem can't or the developer thinks that it shouldn't for
> > some reason, support bind/move mounts then there should be a way for the
> 
> Can you list some valid reasons why an fs could care where it is
> mounted?  The only thing I could think of is a stackable fs, but it
> shouldn't care whether it is overlay-mounted or not.

For my part, autofs and autofs4.
Moving or binding isn't valid.
I tried to design that limitation out version 5 but wasn't able to.
In time I probably can but couldn't continue to support older versions.

> 
> thanks,
> -serge
> 
> > filesystem to tell the kernel that.
> > 
> > Surely a filesystem is in a good position to be able to decide if a
> > mount request "for it" should be allowed to continue based on it's "own
> > situation and capabilities".
> > 
> > Ian
> > 
> > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <1176301632.3377.9.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                       ` <1176301632.3377.9.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
@ 2007-04-11 14:45                         ` Serge E. Hallyn
  0 siblings, 0 replies; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-11 14:45 UTC (permalink / raw)
  To: Ian Kent
  Cc: Miklos Szeredi, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Quoting Ian Kent (raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org):
> On Wed, 2007-04-11 at 09:26 -0500, Serge E. Hallyn wrote:
> > Quoting Ian Kent (raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org):
> > > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote:
> > > > > > >>
> > > > > > >> - users can use bind mounts without having to pre-configure them in
> > > > > > >>   /etc/fstab
> > > > > > >>
> > > > > > 
> > > > > > This is by far the biggest concern I see.  I think the security 
> > > > > > implication of allowing anyone to do bind mounts are poorly understood.
> > > > > 
> > > > > And especially so since there is no way for a filesystem module to veto
> > > > > such requests.
> > > > 
> > > > The filesystem can't veto initial mounts based on destination either.
> > > > I don't think it's up to the filesystem to police bind/move mounts in
> > > > any way.
> > > 
> > > But if a filesystem can't or the developer thinks that it shouldn't for
> > > some reason, support bind/move mounts then there should be a way for the
> > 
> > Can you list some valid reasons why an fs could care where it is
> > mounted?  The only thing I could think of is a stackable fs, but it
> > shouldn't care whether it is overlay-mounted or not.
> 
> For my part, autofs and autofs4.

Ah, thanks.

I can see I'm going to have start using autofs to get to know the
implementation, because it seems clear we'll run into it in the
containers work again (beyond the struct pid conv) at some point.

> Moving or binding isn't valid.
> I tried to design that limitation out version 5 but wasn't able to.
> In time I probably can but couldn't continue to support older versions.

thanks,
-serge

> > 
> > thanks,
> > -serge
> > 
> > > filesystem to tell the kernel that.
> > > 
> > > Surely a filesystem is in a good position to be able to decide if a
> > > mount request "for it" should be allowed to continue based on it's "own
> > > situation and capabilities".
> > > 
> > > Ian
> > > 
> > > 
> > > 
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-06 23:02   ` Andrew Morton
  2007-04-06 23:16     ` H. Peter Anvin
@ 2007-04-07  6:41     ` Miklos Szeredi
       [not found]       ` <E1Ha4cN-0004rc-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
  1 sibling, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-07  6:41 UTC (permalink / raw)
  To: akpm; +Cc: linux-fsdevel, util-linux-ng, containers, linux-kernel

> > This patchset adds support for keeping mount ownership information in
> > the kernel, and allow unprivileged mount(2) and umount(2) in certain
> > cases.
> 
> No replies, huh?

All we need is a comment from Andrew, and the replies come flooding in ;)

> My knowledge of the code which you're touching is not strong, and my spare
> reviewing capacity is not high.  And this work does need close review by
> people who are familar with the code which you're changing.
> 
> So could I suggest that you go for a dig through the git history, identify
> some individuals who look like they know this code, then do a resend,
> cc'ing those people?  Please also cc linux-kernel on that resend.

OK.

> > One thing that is missing from this series is the ability to restrict
> > user mounts to private namespaces.  The reason is that private
> > namespaces have still not gained the momentum and support needed for
> > painless user experience.  So such a feature would not yet get enough
> > attention and testing.  However adding such an optional restriction
> > can be done with minimal changes in the future, once private
> > namespaces have matured.
> 
> I suspect the people who developed and maintain nsproxy would disagree ;)

Well, they better show me some working and simple-to-use userspace
code, because I've not seen anything like that related to mount
namespaces.

pam_namespace.so is one example of a non-working, but probably-not-too-
hard-to-fix one.

I'm just saying this is not yet something that Joe Blow would just
enable by ticking a box in their desktop setup wizard, and it would
all work flawlessly thereafter.  There's still a _long_ way towards
that, and mostly in userspace.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <E1Ha4cN-0004rc-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]       ` <E1Ha4cN-0004rc-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
@ 2007-04-09 14:38         ` Serge E. Hallyn
       [not found]           ` <20070409143802.GB4891-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-09 14:38 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org):
> > > This patchset adds support for keeping mount ownership information in
> > > the kernel, and allow unprivileged mount(2) and umount(2) in certain
> > > cases.
> > 
> > No replies, huh?
> 
> All we need is a comment from Andrew, and the replies come flooding in ;)
> 
> > My knowledge of the code which you're touching is not strong, and my spare
> > reviewing capacity is not high.  And this work does need close review by
> > people who are familar with the code which you're changing.
> > 
> > So could I suggest that you go for a dig through the git history, identify
> > some individuals who look like they know this code, then do a resend,
> > cc'ing those people?  Please also cc linux-kernel on that resend.
> 
> OK.
> 
> > > One thing that is missing from this series is the ability to restrict
> > > user mounts to private namespaces.  The reason is that private
> > > namespaces have still not gained the momentum and support needed for
> > > painless user experience.  So such a feature would not yet get enough
> > > attention and testing.  However adding such an optional restriction
> > > can be done with minimal changes in the future, once private
> > > namespaces have matured.
> > 
> > I suspect the people who developed and maintain nsproxy would disagree ;)
> 
> Well, they better show me some working and simple-to-use userspace
> code, because I've not seen anything like that related to mount
> namespaces.

If you mean to test/exploit them, see
http://lxc.sourceforge.net/patches/2.6.20/2.6.20-lxc8/broken-out/tests/

Compile the ns_exec.c program and do

	ns_exec -m /bin/sh

to get a shell in a new mounts namespace.

> pam_namespace.so is one example of a non-working, but probably-not-too-
> hard-to-fix one.

Non-working?  I sure hope the one used for LSPP certification is
working...  As is the ugly version I wrote 18 mounts ago and use on my
laptop.

> I'm just saying this is not yet something that Joe Blow would just
> enable by ticking a box in their desktop setup wizard, and it would
> all work flawlessly thereafter.  There's still a _long_ way towards
> that, and mostly in userspace.

I'm not sure there's a that long a way to go, but clearly we need to be
showing users what they can do, or they'll never work their way towards
there.

For instance, as you say, a user admin gui with a checkmark and text
boxes saying 'enter new namespace on login', 'create private /tmp',
and 'create private dmcrypted /home' would be trivial right now.

-serge

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <20070409143802.GB4891-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]           ` <20070409143802.GB4891-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
@ 2007-04-09 16:24             ` Miklos Szeredi
  2007-04-09 17:07               ` Serge E. Hallyn
  0 siblings, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-09 16:24 UTC (permalink / raw)
  To: serue-r/Jw6+rmf7HQT0dZR+AlfA
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

> > > > One thing that is missing from this series is the ability to restrict
> > > > user mounts to private namespaces.  The reason is that private
> > > > namespaces have still not gained the momentum and support needed for
> > > > painless user experience.  So such a feature would not yet get enough
> > > > attention and testing.  However adding such an optional restriction
> > > > can be done with minimal changes in the future, once private
> > > > namespaces have matured.
> > > 
> > > I suspect the people who developed and maintain nsproxy would disagree ;)
> > 
> > Well, they better show me some working and simple-to-use userspace
> > code, because I've not seen anything like that related to mount
> > namespaces.
> 
> If you mean to test/exploit them, see
> http://lxc.sourceforge.net/patches/2.6.20/2.6.20-lxc8/broken-out/tests/
> 
> Compile the ns_exec.c program and do
> 
> 	ns_exec -m /bin/sh
> 
> to get a shell in a new mounts namespace.

Cool, thanks.  This is a very nice utility for testing, but for the
end user rather useless:

  - user starts up a private namespace in a shell, mounts something

  - then opens app from menu, tries to access mount, but the mount is
    not there

  - user unhappy

BTW, looking at -mm unshare() on namespace is not privileged any more.
Why is that?  Or rather, what's the reason, that clone() is privileged
and unshare() is not?

> > pam_namespace.so is one example of a non-working, but probably-not-too-
> > hard-to-fix one.
> 
> Non-working?  I sure hope the one used for LSPP certification is
> working...  As is the ugly version I wrote 18 mounts ago and use on my
> laptop.

The one in pam-0.99.6.3-29.1 in opensuse-10.2 is totally broken.  Are
you interested in the details?  I can reproduce it, but forgot to note
down the details of the brokenness.

> > I'm just saying this is not yet something that Joe Blow would just
> > enable by ticking a box in their desktop setup wizard, and it would
> > all work flawlessly thereafter.  There's still a _long_ way towards
> > that, and mostly in userspace.
> 
> I'm not sure there's a that long a way to go, but clearly we need to be
> showing users what they can do, or they'll never work their way towards
> there.

There _is_ a long way to go.  Random things that spring to my mind:

 - using /etc/mtab is broken with private namespaces, using
   /proc/mounts is missing various functionality, that /etc/mtab has,
   for example the "user" option, which this patchset adds

 - need to set up mount propagation from global namespace to private
   ones, mount(8) does not yet have options to configure propagation

 - user namespace setup: what if user has multiple sessions?

   1) namespaces are shared?  That's tricky because the session needs to
   be a child of a namespace server, not of login.  I'm not sure PAM
   can handle this

   2) or mounts are copied on login?  That's not possible currently,
   as there's no way to send a mount between namespaces.  Also it's
   tricky to make sure that new mounts are also shared

> For instance, as you say, a user admin gui with a checkmark and text
> boxes saying 'enter new namespace on login', 'create private /tmp',
> and 'create private dmcrypted /home' would be trivial right now.

Trivial modulo the above slightly non-trivial exemptions ;)

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 16:24             ` Miklos Szeredi
@ 2007-04-09 17:07               ` Serge E. Hallyn
  2007-04-09 17:46                 ` Ram Pai
  2007-04-09 20:10                 ` Miklos Szeredi
  0 siblings, 2 replies; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-09 17:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: serue, akpm, linux-fsdevel, containers, util-linux-ng,
	linux-kernel, Ram Pai

Quoting Miklos Szeredi (miklos@szeredi.hu):
> > > > > One thing that is missing from this series is the ability to restrict
> > > > > user mounts to private namespaces.  The reason is that private
> > > > > namespaces have still not gained the momentum and support needed for
> > > > > painless user experience.  So such a feature would not yet get enough
> > > > > attention and testing.  However adding such an optional restriction
> > > > > can be done with minimal changes in the future, once private
> > > > > namespaces have matured.
> > > > 
> > > > I suspect the people who developed and maintain nsproxy would disagree ;)
> > > 
> > > Well, they better show me some working and simple-to-use userspace
> > > code, because I've not seen anything like that related to mount
> > > namespaces.
> > 
> > If you mean to test/exploit them, see
> > http://lxc.sourceforge.net/patches/2.6.20/2.6.20-lxc8/broken-out/tests/
> > 
> > Compile the ns_exec.c program and do
> > 
> > 	ns_exec -m /bin/sh
> > 
> > to get a shell in a new mounts namespace.
> 
> Cool, thanks.  This is a very nice utility for testing, but for the
> end user rather useless:

Well that depends on which end-user.  Those wanting to create a vserver
or checkpoint-restart job will want this, but clearly we have a long way
to go for that upstream anyway.

>   - user starts up a private namespace in a shell, mounts something
> 
>   - then opens app from menu, tries to access mount, but the mount is
>     not there
> 
>   - user unhappy
> 
> BTW, looking at -mm unshare() on namespace is not privileged any more.
> Why is that?  Or rather, what's the reason, that clone() is privileged
> and unshare() is not?

The check is still there - see kernel/nsproxy.c:unshare_nsproxy_namespaces().

> > > pam_namespace.so is one example of a non-working, but probably-not-too-
> > > hard-to-fix one.
> > 
> > Non-working?  I sure hope the one used for LSPP certification is
> > working...  As is the ugly version I wrote 18 mounts ago and use on my
> > laptop.
> 
> The one in pam-0.99.6.3-29.1 in opensuse-10.2 is totally broken.  Are
> you interested in the details?  I can reproduce it, but forgot to note
> down the details of the brokenness.

I don't know how far removed that is from the one being used by redhat,
but assuming it's the same, then redhat-lspp@redhat.com will be
very interested.

> > > I'm just saying this is not yet something that Joe Blow would just
> > > enable by ticking a box in their desktop setup wizard, and it would
> > > all work flawlessly thereafter.  There's still a _long_ way towards
> > > that, and mostly in userspace.
> > 
> > I'm not sure there's a that long a way to go, but clearly we need to be
> > showing users what they can do, or they'll never work their way towards
> > there.
> 
> There _is_ a long way to go.  Random things that spring to my mind:
> 
>  - using /etc/mtab is broken with private namespaces, using
>    /proc/mounts is missing various functionality, that /etc/mtab has,
>    for example the "user" option, which this patchset adds

Agreed those need fixing.

>  - need to set up mount propagation from global namespace to private
>    ones, mount(8) does not yet have options to configure propagation

Hmm, I guess I get lost using my own little systems, and just assumed
that shared subtree functionality was making its way up into mount(8).
Ram, have you been working on that?

>  - user namespace setup: what if user has multiple sessions?
> 
>    1) namespaces are shared?  That's tricky because the session needs to
>    be a child of a namespace server, not of login.  I'm not sure PAM
>    can handle this
> 
>    2) or mounts are copied on login?  That's not possible currently,
>    as there's no way to send a mount between namespaces.  Also it's
>    tricky to make sure that new mounts are also shared

See toward the end of the 'shared subtrees' OLS paper from last year for
a suggestion on how to let users effectively 'log in to' an existing
private mounts ns.

> > For instance, as you say, a user admin gui with a checkmark and text
> > boxes saying 'enter new namespace on login', 'create private /tmp',
> > and 'create private dmcrypted /home' would be trivial right now.
> 
> Trivial modulo the above slightly non-trivial exemptions ;)

Ok, so it can use some very non-trivial fine-tuning...

But I've been using the above - minus the trivial gui - for over a year
without ever worrying about any of these short-comings.

> Miklos

-serge

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 17:07               ` Serge E. Hallyn
@ 2007-04-09 17:46                 ` Ram Pai
  2007-04-09 18:25                   ` H. Peter Anvin
  2007-04-10 10:33                   ` Karel Zak
  2007-04-09 20:10                 ` Miklos Szeredi
  1 sibling, 2 replies; 54+ messages in thread
From: Ram Pai @ 2007-04-09 17:46 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Miklos Szeredi, akpm, linux-fsdevel, containers, util-linux-ng,
	linux-kernel

On Mon, 2007-04-09 at 12:07 -0500, Serge E. Hallyn wrote:
> Quoting Miklos Szeredi (miklos@szeredi.hu):

> >  - need to set up mount propagation from global namespace to private
> >    ones, mount(8) does not yet have options to configure propagation
> 
> Hmm, I guess I get lost using my own little systems, and just assumed
> that shared subtree functionality was making its way up into mount(8).
> Ram, have you been working on that?

It is in FC6. I dont know the status off upstream util-linux. I did
submit the patch many times to Adrian Bunk (the then util-linux
maintainer) and got no response. I have not pushed the patches to the
new maintainer(Karel Zak?) though.

RP


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 17:46                 ` Ram Pai
@ 2007-04-09 18:25                   ` H. Peter Anvin
  2007-04-10 10:33                   ` Karel Zak
  1 sibling, 0 replies; 54+ messages in thread
From: H. Peter Anvin @ 2007-04-09 18:25 UTC (permalink / raw)
  To: Ram Pai
  Cc: Serge E. Hallyn, Miklos Szeredi, akpm, linux-fsdevel, containers,
	util-linux-ng, linux-kernel

Ram Pai wrote:
> 
> It is in FC6. I dont know the status off upstream util-linux. I did
> submit the patch many times to Adrian Bunk (the then util-linux
> maintainer) and got no response. I have not pushed the patches to the
> new maintainer(Karel Zak?) though.
> 

Well, do that, then :)

Seriously.  The whole point of util-linux-ng is to make forward progress.

	-hpa

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 17:46                 ` Ram Pai
  2007-04-09 18:25                   ` H. Peter Anvin
@ 2007-04-10 10:33                   ` Karel Zak
  1 sibling, 0 replies; 54+ messages in thread
From: Karel Zak @ 2007-04-10 10:33 UTC (permalink / raw)
  To: Ram Pai
  Cc: Serge E. Hallyn, Miklos Szeredi, akpm, linux-fsdevel, containers,
	util-linux-ng, linux-kernel

On Mon, Apr 09, 2007 at 10:46:25AM -0700, Ram Pai wrote:
> On Mon, 2007-04-09 at 12:07 -0500, Serge E. Hallyn wrote:
> > Quoting Miklos Szeredi (miklos@szeredi.hu):
> 
> > >  - need to set up mount propagation from global namespace to private
> > >    ones, mount(8) does not yet have options to configure propagation
> > 
> > Hmm, I guess I get lost using my own little systems, and just assumed
> > that shared subtree functionality was making its way up into mount(8).
> > Ram, have you been working on that?
> 
> It is in FC6. I dont know the status off upstream util-linux. I did
> submit the patch many times to Adrian Bunk (the then util-linux
> maintainer) and got no response. I have not pushed the patches to the
> new maintainer(Karel Zak?) though.

 The "shared-subtree" patch has been applied:
 http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=commitdiff;h=389fbea536e4308d9475fa2a89e53e188ce8a0e3;hp=939a997de0c761d29fb7530976ca20da4898703a

 
    Karel

-- 
 Karel Zak  <kzak@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 17:07               ` Serge E. Hallyn
  2007-04-09 17:46                 ` Ram Pai
@ 2007-04-09 20:10                 ` Miklos Szeredi
  2007-04-10  8:38                   ` Ram Pai
  1 sibling, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-09 20:10 UTC (permalink / raw)
  To: serue
  Cc: akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel,
	linuxram

> > The one in pam-0.99.6.3-29.1 in opensuse-10.2 is totally broken.  Are
> > you interested in the details?  I can reproduce it, but forgot to note
> > down the details of the brokenness.
> 
> I don't know how far removed that is from the one being used by redhat,
> but assuming it's the same, then redhat-lspp@redhat.com will be
> very interested.

OK.

> >  - user namespace setup: what if user has multiple sessions?
> > 
> >    1) namespaces are shared?  That's tricky because the session needs to
> >    be a child of a namespace server, not of login.  I'm not sure PAM
> >    can handle this
> > 
> >    2) or mounts are copied on login?  That's not possible currently,
> >    as there's no way to send a mount between namespaces.  Also it's
> >    tricky to make sure that new mounts are also shared
> 
> See toward the end of the 'shared subtrees' OLS paper from last year for
> a suggestion on how to let users effectively 'log in to' an existing
> private mounts ns.

This?

  1. create a new namespace
  2. bind /share/$USER to /share
  3. for each pair ($who, $what) such that
     /share/$USER/$who/$what exists, look
     in /share/$who/allowed for "peer $what
     $USER" or "slave $what $USER". If the
     former is found, rbind /share/$who/$what
     on /share/$USER/$who/$what; if the
     latter is found, do the same and
     follow with marking subtree under
     /share/$USER/$who/$what as slave.
  4. rbind /share/$USER to /share
  5. mark subtree under /share as private.
  6. umount -l /share

Well, someone please explain using short words, because I don't
understand at all.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 20:10                 ` Miklos Szeredi
@ 2007-04-10  8:38                   ` Ram Pai
  2007-04-11 10:44                     ` Miklos Szeredi
  0 siblings, 1 reply; 54+ messages in thread
From: Ram Pai @ 2007-04-10  8:38 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: serue, akpm, linux-fsdevel, containers, util-linux-ng,
	linux-kernel

On Mon, 2007-04-09 at 22:10 +0200, Miklos Szeredi wrote:
> > > The one in pam-0.99.6.3-29.1 in opensuse-10.2 is totally broken.  Are
> > > you interested in the details?  I can reproduce it, but forgot to note
> > > down the details of the brokenness.
> > 
> > I don't know how far removed that is from the one being used by redhat,
> > but assuming it's the same, then redhat-lspp@redhat.com will be
> > very interested.
> 
> OK.
> 
> > >  - user namespace setup: what if user has multiple sessions?
> > > 
> > >    1) namespaces are shared?  That's tricky because the session needs to
> > >    be a child of a namespace server, not of login.  I'm not sure PAM
> > >    can handle this
> > > 
> > >    2) or mounts are copied on login?  That's not possible currently,
> > >    as there's no way to send a mount between namespaces.  Also it's
> > >    tricky to make sure that new mounts are also shared
> > 
> > See toward the end of the 'shared subtrees' OLS paper from last year for
> > a suggestion on how to let users effectively 'log in to' an existing
> > private mounts ns.
> 
> This?
> 
>   1. create a new namespace
>   2. bind /share/$USER to /share
>   3. for each pair ($who, $what) such that
>      /share/$USER/$who/$what exists, look
>      in /share/$who/allowed for "peer $what
>      $USER" or "slave $what $USER". If the
>      former is found, rbind /share/$who/$what
>      on /share/$USER/$who/$what; if the
>      latter is found, do the same and
>      follow with marking subtree under
>      /share/$USER/$who/$what as slave.
>   4. rbind /share/$USER to /share
>   5. mark subtree under /share as private.
>   6. umount -l /share
> 
> Well, someone please explain using short words, because I don't
> understand at all.

I am trying to re-construct Viro's thoughts.  I think the steps outlined
above; though not accurate, are still insightful.

The idea is -- there is one master namespace, which has
under /share, a replica of the mount tree of namespaces belonging to all
users. 

for example if there are two users A and B, then in the master namespace
under /share you will find /share/A and /share/B, each reflecting the
mount tree for the namespaces belonging to user-A and user-B
respectively. 

Note: /share is a shared mount-tree, which means it can propagate mount
events.

Everytime the user logs on the machine, a new namespace is created which
is the clone of the master namespace. In this new namespace,
the /share/$user is made the root of the namespace. Also if other
users have allowed part of their namespace available to this user,
than those mounts are also brought under this namespace. And finally the
entire tree under /share is unmounted.

Note, though multiple namespaces can exist simultaneously for the same
user, the user is provided the illusion of per-process-namespace since
all the namespaces look identical.  

I am trying to rewrite the steps outlined above, which may or may not
reflect Viro's thoughts, but certainly reflect my reconstruction of
viro's thoughts.

1. clone the master namespace.

2. in the new namespace

	move the tree under /share/$me to /
        for each ($user, $what, $how) {
            move /share/$user/$what to /$what
	    if ($how == slave) {
                 make the mount tree under /$what as slave
            }
        }

3. in the new namespace make the tree under 
       /share as private and unmount /share

RP

> 
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-10  8:38                   ` Ram Pai
@ 2007-04-11 10:44                     ` Miklos Szeredi
       [not found]                       ` <E1HbaJV-00059N-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-11 10:44 UTC (permalink / raw)
  To: linuxram
  Cc: serue, akpm, linux-fsdevel, containers, util-linux-ng,
	linux-kernel

> 1. clone the master namespace.
> 
> 2. in the new namespace
> 
> 	move the tree under /share/$me to /
>         for each ($user, $what, $how) {
>             move /share/$user/$what to /$what
> 	    if ($how == slave) {
>                  make the mount tree under /$what as slave
>             }
>         }
>         
> 3. in the new namespace make the tree under 
>        /share as private and unmount /share

Thanks.  I get the basic idea now: the namespace itself need not be
shared between the sessions, it is enough if "share" propagation is
set up between the different namespaces of a user.

I don't yet see either in your or Viro's description how the trees
under /share/$USER are initialized.  I guess they are recursively
bound from /, and are made slaves.

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <E1HbaJV-00059N-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                       ` <E1HbaJV-00059N-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
@ 2007-04-11 18:28                         ` Ram Pai
       [not found]                           ` <1176316116.2811.39.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Ram Pai @ 2007-04-11 18:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote:
> > 1. clone the master namespace.
> > 
> > 2. in the new namespace
> > 
> > 	move the tree under /share/$me to /
> >         for each ($user, $what, $how) {
> >             move /share/$user/$what to /$what
> > 	    if ($how == slave) {
> >                  make the mount tree under /$what as slave
> >             }
> >         }
> >         
> > 3. in the new namespace make the tree under 
> >        /share as private and unmount /share
> 
> Thanks.  I get the basic idea now: the namespace itself need not be
> shared between the sessions, it is enough if "share" propagation is
> set up between the different namespaces of a user.
> 
> I don't yet see either in your or Viro's description how the trees
> under /share/$USER are initialized.  I guess they are recursively
> bound from /, and are made slaves.

yes. I suppose, when a userid is created one of the steps would be

mount --rbind / /share/$USER
mount --make-rslave /share/$USER
mount --make-rshared /share/$USER

RP







> Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <1176316116.2811.39.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                           ` <1176316116.2811.39.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>
@ 2007-04-13 11:58                             ` Miklos Szeredi
       [not found]                               ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
  2007-04-13 20:07                               ` Karel Zak
  0 siblings, 2 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-13 11:58 UTC (permalink / raw)
  To: linuxram-r/Jw6+rmf7HQT0dZR+AlfA
  Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

> On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote:
> > > 1. clone the master namespace.
> > > 
> > > 2. in the new namespace
> > > 
> > > 	move the tree under /share/$me to /
> > >         for each ($user, $what, $how) {
> > >             move /share/$user/$what to /$what
> > > 	    if ($how == slave) {
> > >                  make the mount tree under /$what as slave
> > >             }
> > >         }
> > >         
> > > 3. in the new namespace make the tree under 
> > >        /share as private and unmount /share
> > 
> > Thanks.  I get the basic idea now: the namespace itself need not be
> > shared between the sessions, it is enough if "share" propagation is
> > set up between the different namespaces of a user.
> > 
> > I don't yet see either in your or Viro's description how the trees
> > under /share/$USER are initialized.  I guess they are recursively
> > bound from /, and are made slaves.
> 
> yes. I suppose, when a userid is created one of the steps would be
> 
> mount --rbind / /share/$USER
> mount --make-rslave /share/$USER
> mount --make-rshared /share/$USER

Thinking a bit more about this, I'm quite sure most users wouldn't
even want private namespaces.  It would be enough to

  chroot /share/$USER

and be done with it.

Private namespaces are only good for keeping a bunch of mounts
referenced by a group of processes.  But my guess is, that the natural
behavior for users is to see a persistent set of mounts.

If for example they mount something on a remote machine, then log out
from the ssh session and later log back in, they would want to see
their previous mount still there.

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                               ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
@ 2007-04-13 13:28                                 ` Serge E. Hallyn
  2007-04-13 14:05                                   ` Miklos Szeredi
  2007-04-16  7:59                                 ` Ram Pai
  1 sibling, 1 reply; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-13 13:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linuxram-r/Jw6+rmf7HQT0dZR+AlfA, serue-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org):
> > On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote:
> > > > 1. clone the master namespace.
> > > > 
> > > > 2. in the new namespace
> > > > 
> > > > 	move the tree under /share/$me to /
> > > >         for each ($user, $what, $how) {
> > > >             move /share/$user/$what to /$what
> > > > 	    if ($how == slave) {
> > > >                  make the mount tree under /$what as slave
> > > >             }
> > > >         }
> > > >         
> > > > 3. in the new namespace make the tree under 
> > > >        /share as private and unmount /share
> > > 
> > > Thanks.  I get the basic idea now: the namespace itself need not be
> > > shared between the sessions, it is enough if "share" propagation is
> > > set up between the different namespaces of a user.
> > > 
> > > I don't yet see either in your or Viro's description how the trees
> > > under /share/$USER are initialized.  I guess they are recursively
> > > bound from /, and are made slaves.
> > 
> > yes. I suppose, when a userid is created one of the steps would be
> > 
> > mount --rbind / /share/$USER
> > mount --make-rslave /share/$USER
> > mount --make-rshared /share/$USER
> 
> Thinking a bit more about this, I'm quite sure most users wouldn't
> even want private namespaces.  It would be enough to
> 
>   chroot /share/$USER
> 
> and be done with it.
> 
> Private namespaces are only good for keeping a bunch of mounts
> referenced by a group of processes.  But my guess is, that the natural
> behavior for users is to see a persistent set of mounts.
> 
> If for example they mount something on a remote machine, then log out
> from the ssh session and later log back in, they would want to see
> their previous mount still there.
> 
> Miklos

Agreed on desired behavior, but not on chroot sufficing.  It actually
sounds like you want exactly what was outlined in the OLS paper.

Users still need to be in a different mounts namespace from the admin
user so long as we consider the deluser and backup problems to be
legitimate problems (well, so long as user mounts are allowed).  So,
when they log in, pam gives them a new namespace and chroots them into
/share/$USER.

Assuming I'm thinking clearly  :)

-serge

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-13 13:28                                 ` Serge E. Hallyn
@ 2007-04-13 14:05                                   ` Miklos Szeredi
  2007-04-13 21:44                                     ` Serge E. Hallyn
       [not found]                                     ` <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
  0 siblings, 2 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-13 14:05 UTC (permalink / raw)
  To: serue
  Cc: linuxram, serue, akpm, linux-fsdevel, containers, util-linux-ng,
	linux-kernel

> > Thinking a bit more about this, I'm quite sure most users wouldn't
> > even want private namespaces.  It would be enough to
> > 
> >   chroot /share/$USER
> > 
> > and be done with it.
> > 
> > Private namespaces are only good for keeping a bunch of mounts
> > referenced by a group of processes.  But my guess is, that the natural
> > behavior for users is to see a persistent set of mounts.
> > 
> > If for example they mount something on a remote machine, then log out
> > from the ssh session and later log back in, they would want to see
> > their previous mount still there.
> > 
> > Miklos
> 
> Agreed on desired behavior, but not on chroot sufficing.  It actually
> sounds like you want exactly what was outlined in the OLS paper.
> 
> Users still need to be in a different mounts namespace from the admin
> user so long as we consider the deluser and backup problems

I don't think it matters, because /share/$USER duplicates a part or
the whole of the user's namespace.

So backup would have to be taught about /share anyway, and deluser
operates on /home/$USER and not on /share/*, so there shouldn't be any
problem.

There's actually very little difference between rbind+chroot, and
CLONE_NEWNS.  In a private namespace:

  1) when no more processes reference the namespace, the tree will be
    disbanded

  2) the mount tree won't be accessible from outside the namespace

Wanting a persistent namespace contradicts 1).

Wanting a per-user (as opposed to per-session) namespace contradicts
2).  The namespace _has_ to be accessible from outside, so that a new
session can access/copy it.

So both requirements point to the rbind/chroot solution.

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-13 14:05                                   ` Miklos Szeredi
@ 2007-04-13 21:44                                     ` Serge E. Hallyn
       [not found]                                       ` <20070413214415.GA28629-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
       [not found]                                     ` <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
  1 sibling, 1 reply; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-13 21:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: serue, linuxram, akpm, linux-fsdevel, containers, util-linux-ng,
	linux-kernel

Quoting Miklos Szeredi (miklos@szeredi.hu):
> > > Thinking a bit more about this, I'm quite sure most users wouldn't
> > > even want private namespaces.  It would be enough to
> > > 
> > >   chroot /share/$USER
> > > 
> > > and be done with it.
> > > 
> > > Private namespaces are only good for keeping a bunch of mounts
> > > referenced by a group of processes.  But my guess is, that the natural
> > > behavior for users is to see a persistent set of mounts.
> > > 
> > > If for example they mount something on a remote machine, then log out
> > > from the ssh session and later log back in, they would want to see
> > > their previous mount still there.
> > > 
> > > Miklos
> > 
> > Agreed on desired behavior, but not on chroot sufficing.  It actually
> > sounds like you want exactly what was outlined in the OLS paper.
> > 
> > Users still need to be in a different mounts namespace from the admin
> > user so long as we consider the deluser and backup problems
> 
> I don't think it matters, because /share/$USER duplicates a part or
> the whole of the user's namespace.
> 
> So backup would have to be taught about /share anyway, and deluser
> operates on /home/$USER and not on /share/*, so there shouldn't be any
> problem.

In what I was thinking of, /share/$USER is bind mounted to
~$USER/share, so it would have to be done in a private namespace in
order for deluser to not be tricked.

> There's actually very little difference between rbind+chroot, and
> CLONE_NEWNS.  In a private namespace:
> 
>   1) when no more processes reference the namespace, the tree will be
>     disbanded
> 
>   2) the mount tree won't be accessible from outside the namespace

But it *can* be, if properly set up.  That's part of the point of the
example in the OLS paper.  When a user logs in, sshd clones a new
namespace, then bind-mounts /share/$USER into ~$USER/share.  So assuming
that /share/$USER was --make-shared'd, it and ~$USER are now in the
same peer group, and any changes made by the user under ~$USER will
be reflected back into /share/$USER.

> Wanting a persistent namespace contradicts 1).

Not necessarily, see above.

> Wanting a per-user (as opposed to per-session) namespace contradicts
> 2).  The namespace _has_ to be accessible from outside, so that a new
> session can access/copy it.

Again, I *think* you are wrong that private namespace contradicts this
requirement.

> So both requirements point to the rbind/chroot solution.

It all points to a combination of the two  :-)

-serge

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <20070413214415.GA28629-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                                       ` <20070413214415.GA28629-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
@ 2007-04-15 20:39                                         ` Miklos Szeredi
       [not found]                                           ` <E1HdBVc-0005pL-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-15 20:39 UTC (permalink / raw)
  To: serue-r/Jw6+rmf7HQT0dZR+AlfA
  Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, serue-r/Jw6+rmf7HQT0dZR+AlfA,
	linuxram-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

> > > Agreed on desired behavior, but not on chroot sufficing.  It actually
> > > sounds like you want exactly what was outlined in the OLS paper.
> > > 
> > > Users still need to be in a different mounts namespace from the admin
> > > user so long as we consider the deluser and backup problems
> > 
> > I don't think it matters, because /share/$USER duplicates a part or
> > the whole of the user's namespace.
> > 
> > So backup would have to be taught about /share anyway, and deluser
> > operates on /home/$USER and not on /share/*, so there shouldn't be any
> > problem.
> 
> In what I was thinking of, /share/$USER is bind mounted to
> ~$USER/share, so it would have to be done in a private namespace in
> order for deluser to not be tricked.

But /share/$USER is surely not bind mounted to ~$USER/share in the
_global_ namespace, is it?  I can't see any sense in that.

> > There's actually very little difference between rbind+chroot, and
> > CLONE_NEWNS.  In a private namespace:
> > 
> >   1) when no more processes reference the namespace, the tree will be
> >     disbanded
> > 
> >   2) the mount tree won't be accessible from outside the namespace
> 
> But it *can* be, if properly set up.  That's part of the point of the
> example in the OLS paper.  When a user logs in, sshd clones a new
> namespace, then bind-mounts /share/$USER into ~$USER/share.  So assuming
> that /share/$USER was --make-shared'd, it and ~$USER are now in the
> same peer group, and any changes made by the user under ~$USER will
> be reflected back into /share/$USER.

I acknowledge, that it can be done.  My point was that it can be done
more simply _without_ using CLONE_NS.

> > Wanting a persistent namespace contradicts 1).
> 
> Not necessarily, see above.
> 
> > Wanting a per-user (as opposed to per-session) namespace contradicts
> > 2).  The namespace _has_ to be accessible from outside, so that a new
> > session can access/copy it.
> 
> Again, I *think* you are wrong that private namespace contradicts this
> requirement.

I'm not saying there's any contradiction, I'm saying rbind+chroot is a
better fit.

I haven't yet heard a single reason why a per-session namespace with
parts shared per-user is better than just a per-user namespace.

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <E1HdBVc-0005pL-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                                           ` <E1HdBVc-0005pL-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
@ 2007-04-16  1:11                                             ` Serge E. Hallyn
  0 siblings, 0 replies; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-16  1:11 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA, linuxram-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org):
> > > > Agreed on desired behavior, but not on chroot sufficing.  It actually
> > > > sounds like you want exactly what was outlined in the OLS paper.
> > > > 
> > > > Users still need to be in a different mounts namespace from the admin
> > > > user so long as we consider the deluser and backup problems
> > > 
> > > I don't think it matters, because /share/$USER duplicates a part or
> > > the whole of the user's namespace.
> > > 
> > > So backup would have to be taught about /share anyway, and deluser
> > > operates on /home/$USER and not on /share/*, so there shouldn't be any
> > > problem.
> > 
> > In what I was thinking of, /share/$USER is bind mounted to
> > ~$USER/share, so it would have to be done in a private namespace in
> > order for deluser to not be tricked.
> 
> But /share/$USER is surely not bind mounted to ~$USER/share in the
> _global_ namespace, is it?  I can't see any sense in that.

No it's not, only in the private namespace.

> > > There's actually very little difference between rbind+chroot, and
> > > CLONE_NEWNS.  In a private namespace:
> > > 
> > >   1) when no more processes reference the namespace, the tree will be
> > >     disbanded
> > > 
> > >   2) the mount tree won't be accessible from outside the namespace
> > 
> > But it *can* be, if properly set up.  That's part of the point of the
> > example in the OLS paper.  When a user logs in, sshd clones a new
> > namespace, then bind-mounts /share/$USER into ~$USER/share.  So assuming
> > that /share/$USER was --make-shared'd, it and ~$USER are now in the
> > same peer group, and any changes made by the user under ~$USER will
> > be reflected back into /share/$USER.
> 
> I acknowledge, that it can be done.  My point was that it can be done
> more simply _without_ using CLONE_NS.

Seems like a matter of preference, but I see what you're saying.

> > > Wanting a persistent namespace contradicts 1).
> > 
> > Not necessarily, see above.
> > 
> > > Wanting a per-user (as opposed to per-session) namespace contradicts
> > > 2).  The namespace _has_ to be accessible from outside, so that a new
> > > session can access/copy it.
> > 
> > Again, I *think* you are wrong that private namespace contradicts this
> > requirement.
> 
> I'm not saying there's any contradiction, I'm saying rbind+chroot is a
> better fit.

Ok, I see.

> I haven't yet heard a single reason why a per-session namespace with
> parts shared per-user is better than just a per-user namespace.

In fact I suspect we could show that they are functionally equivalent
(for your purposes) by drawing the fs tree and peer groups from
current->fs->root on up for both methods.

And not using private namespaces leaves the admin (at least for now)
better able to diagnose the state of the system.

-serge

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                                     ` <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
@ 2007-04-16  8:18                                       ` Ram Pai
       [not found]                                         ` <1176711509.9488.4.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Ram Pai @ 2007-04-16  8:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, 2007-04-13 at 16:05 +0200, Miklos Szeredi wrote:
> > > Thinking a bit more about this, I'm quite sure most users wouldn't
> > > even want private namespaces.  It would be enough to
> > > 
> > >   chroot /share/$USER
> > > 
> > > and be done with it.
> > > 
> > > Private namespaces are only good for keeping a bunch of mounts
> > > referenced by a group of processes.  But my guess is, that the natural
> > > behavior for users is to see a persistent set of mounts.
> > > 
> > > If for example they mount something on a remote machine, then log out
> > > from the ssh session and later log back in, they would want to see
> > > their previous mount still there.
> > > 
> > > Miklos
> > 
> > Agreed on desired behavior, but not on chroot sufficing.  It actually
> > sounds like you want exactly what was outlined in the OLS paper.
> > 
> > Users still need to be in a different mounts namespace from the admin
> > user so long as we consider the deluser and backup problems
> 
> I don't think it matters, because /share/$USER duplicates a part or
> the whole of the user's namespace.
> 
> So backup would have to be taught about /share anyway, and deluser
> operates on /home/$USER and not on /share/*, so there shouldn't be any
> problem.
> 
> There's actually very little difference between rbind+chroot, and
> CLONE_NEWNS.  In a private namespace:
> 
>   1) when no more processes reference the namespace, the tree will be
>     disbanded
> 
>   2) the mount tree won't be accessible from outside the namespace
> 
> Wanting a persistent namespace contradicts 1).
> 
> Wanting a per-user (as opposed to per-session) namespace contradicts
> 2).  The namespace _has_ to be accessible from outside, so that a new
> session can access/copy it.

As i mentioned in the previous mail, disbanding all the namespaces of a
user will not disband his mount tree, because a mirror of the mount tree
still continues to exist in /share/$USER in the admin namespace.

And a new user session can always use this copy to create a namespace
that  looks identical to that which existed earlier.


> 
> So both requirements point to the rbind/chroot solution.

Arn't there ways to escape chroot jails? Serge had pointed me to a URL
which showed chroots can be escaped. And if that is true than having all
user's private mount tree in the same namespace can be a security issue?

RP

> 
> Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <1176711509.9488.4.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                                         ` <1176711509.9488.4.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>
@ 2007-04-16  9:27                                           ` Miklos Szeredi
  2007-04-16 15:40                                             ` Eric W. Biederman
  0 siblings, 1 reply; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-16  9:27 UTC (permalink / raw)
  To: linuxram-r/Jw6+rmf7HQT0dZR+AlfA
  Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

> Arn't there ways to escape chroot jails? Serge had pointed me to a URL
> which showed chroots can be escaped. And if that is true than having all
> user's private mount tree in the same namespace can be a security issue?

No.  In fact chrooting the user into /share/$USER will actually
_grant_ a privilege to the user, instead of taking it away.  It allows
the user to modify it's root namespace, which it wouldn't be able to
in the initial namespace.

So even if the user could escape from the chroot (which I doubt), s/he
would not be able to do any harm, since unprivileged mounting would be
restricted to /share.  Also /share/$USER should only have read/search
permission for $USER or no permissions at all, which would mean, that
other users' namespaces would be safe from tampering as well.

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-16  9:27                                           ` Miklos Szeredi
@ 2007-04-16 15:40                                             ` Eric W. Biederman
       [not found]                                               ` <m1d524l43w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
  0 siblings, 1 reply; 54+ messages in thread
From: Eric W. Biederman @ 2007-04-16 15:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linuxram, containers, linux-fsdevel, akpm, util-linux-ng,
	linux-kernel

Miklos Szeredi <miklos@szeredi.hu> writes:

>> Arn't there ways to escape chroot jails? Serge had pointed me to a URL
>> which showed chroots can be escaped. And if that is true than having all
>> user's private mount tree in the same namespace can be a security issue?
>
> No.  In fact chrooting the user into /share/$USER will actually
> _grant_ a privilege to the user, instead of taking it away.  It allows
> the user to modify it's root namespace, which it wouldn't be able to
> in the initial namespace.
>
> So even if the user could escape from the chroot (which I doubt), s/he
> would not be able to do any harm, since unprivileged mounting would be
> restricted to /share.  Also /share/$USER should only have read/search
> permission for $USER or no permissions at all, which would mean, that
> other users' namespaces would be safe from tampering as well.

A couple of points.
- chroot can be escaped, it is just a chdir for the root directory it
  is not a security feature.  The only security is that you have to be root to call chdir.
  A carefully done namespace setup won't have that issue.

- While it may not violate security as far as what a user is allowed to modify it may
  violate security as far as what a user is allowed to see.

There are interesting per login cases as well such as allowing a user to replicate
their mount tree from another machine when they log in.  When /home is on a network
filesystem this can be very practical and can allow propagation of mounts across
machines not just across a single login session.

Eric





^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <m1d524l43w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                                               ` <m1d524l43w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
@ 2007-04-16 15:55                                                 ` Miklos Szeredi
  0 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-16 15:55 UTC (permalink / raw)
  To: ebiederm-aS9lmoZGLiVWk0Htik3J/w
  Cc: linuxram-r/Jw6+rmf7HQT0dZR+AlfA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

> >> Arn't there ways to escape chroot jails? Serge had pointed me to a URL
> >> which showed chroots can be escaped. And if that is true than having all
> >> user's private mount tree in the same namespace can be a security issue?
> >
> > No.  In fact chrooting the user into /share/$USER will actually
> > _grant_ a privilege to the user, instead of taking it away.  It allows
> > the user to modify it's root namespace, which it wouldn't be able to
> > in the initial namespace.
> >
> > So even if the user could escape from the chroot (which I doubt), s/he
> > would not be able to do any harm, since unprivileged mounting would be
> > restricted to /share.  Also /share/$USER should only have read/search
> > permission for $USER or no permissions at all, which would mean, that
> > other users' namespaces would be safe from tampering as well.
> 
> A couple of points.
> - chroot can be escaped, it is just a chdir for the root directory
> it is not a security feature.  The only security is that you have to
> be root to call chdir.  A carefully done namespace setup won't have
> that issue.
> 
> - While it may not violate security as far as what a user is allowed
> to modify it may violate security as far as what a user is allowed
> to see.

I think that's just up to the permissions in the global namespace.  In
this example if you 'chmod 0 /share' there won't be anything for the
user to see.

> There are interesting per login cases as well such as allowing a
> user to replicate their mount tree from another machine when they
> log in.  When /home is on a network filesystem this can be very
> practical and can allow propagation of mounts across machines not
> just across a single login session.

Yeah, sounds interesting, but I think it's better to get the basics
working first, and then we can start to think about the extras.

Btw, there's nothing that prevents cloning the namespace _after_
chrooting into the per-user tree.  That would still be simpler than
doing it the other way round: first creating per-session namespaces
and then setting up mount propagation between them.

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                               ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
  2007-04-13 13:28                                 ` Serge E. Hallyn
@ 2007-04-16  7:59                                 ` Ram Pai
  1 sibling, 0 replies; 54+ messages in thread
From: Ram Pai @ 2007-04-16  7:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, 2007-04-13 at 13:58 +0200, Miklos Szeredi wrote:
> > On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote:
> > > > 1. clone the master namespace.
> > > > 
> > > > 2. in the new namespace
> > > > 
> > > > 	move the tree under /share/$me to /
> > > >         for each ($user, $what, $how) {
> > > >             move /share/$user/$what to /$what
> > > > 	    if ($how == slave) {
> > > >                  make the mount tree under /$what as slave
> > > >             }
> > > >         }
> > > >         
> > > > 3. in the new namespace make the tree under 
> > > >        /share as private and unmount /share
> > > 
> > > Thanks.  I get the basic idea now: the namespace itself need not be
> > > shared between the sessions, it is enough if "share" propagation is
> > > set up between the different namespaces of a user.
> > > 
> > > I don't yet see either in your or Viro's description how the trees
> > > under /share/$USER are initialized.  I guess they are recursively
> > > bound from /, and are made slaves.
> > 
> > yes. I suppose, when a userid is created one of the steps would be
> > 
> > mount --rbind / /share/$USER
> > mount --make-rslave /share/$USER
> > mount --make-rshared /share/$USER
> 
> Thinking a bit more about this, I'm quite sure most users wouldn't
> even want private namespaces.  It would be enough to
> 
>   chroot /share/$USER
> 
> and be done with it.
> 
> Private namespaces are only good for keeping a bunch of mounts
> referenced by a group of processes.  But my guess is, that the natural
> behavior for users is to see a persistent set of mounts.
> 
> If for example they mount something on a remote machine, then log out
> from the ssh session and later log back in, they would want to see
> their previous mount still there.

They will continue see their previous mount tree. 
Even if all the namespaces belonging to the different sessions of the
user get dismantled when all the sessions exit, the a mirror of those 
mount trees continue to exist under /share/$USER in the original
namespace.  So I don't think we have a issue.

NOTE: when I say 'original namespace' I mean the admin namespace; the
first namespace that gets created when the machine boots.

RP


> 
> Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-13 11:58                             ` Miklos Szeredi
       [not found]                               ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
@ 2007-04-13 20:07                               ` Karel Zak
       [not found]                                 ` <20070413200720.GS31445-CxBs/XhZ2BtHjqfyn1fVYA@public.gmane.org>
  1 sibling, 1 reply; 54+ messages in thread
From: Karel Zak @ 2007-04-13 20:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linuxram, serue, akpm, linux-fsdevel, containers, util-linux-ng,
	linux-kernel

On Fri, Apr 13, 2007 at 01:58:59PM +0200, Miklos Szeredi wrote:
> > On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote:
> > > > 1. clone the master namespace.
> > > > 
> > > > 2. in the new namespace
> > > > 
> > > > 	move the tree under /share/$me to /
> > > >         for each ($user, $what, $how) {
> > > >             move /share/$user/$what to /$what
> > > > 	    if ($how == slave) {
> > > >                  make the mount tree under /$what as slave
> > > >             }
> > > >         }
> > > >         
> > > > 3. in the new namespace make the tree under 
> > > >        /share as private and unmount /share
> > > 
> > > Thanks.  I get the basic idea now: the namespace itself need not be
> > > shared between the sessions, it is enough if "share" propagation is
> > > set up between the different namespaces of a user.
> > > 
> > > I don't yet see either in your or Viro's description how the trees
> > > under /share/$USER are initialized.  I guess they are recursively
> > > bound from /, and are made slaves.
> > 
> > yes. I suppose, when a userid is created one of the steps would be
> > 
> > mount --rbind / /share/$USER
> > mount --make-rslave /share/$USER
> > mount --make-rshared /share/$USER
> 
> Thinking a bit more about this, I'm quite sure most users wouldn't
> even want private namespaces.  It would be enough to
> 
>   chroot /share/$USER
> 
> and be done with it.

 I don't think so. How to you want to implement non-shared /tmp
 directories? The chroot is overkill in this case. See:

 http://www.coker.com.au/selinux/talks/sage-2006/PolyInstantiatedDirectories.html
 http://danwalsh.livejournal.com/

> Private namespaces are only good for keeping a bunch of mounts
> referenced by a group of processes.  But my guess is, that the natural
> behavior for users is to see a persistent set of mounts.
> 
> If for example they mount something on a remote machine, then log out
> from the ssh session and later log back in, they would want to see
> their previous mount still there.

 They can mount to /mnt where the directory is shared ("mount
 --make-shared /mnt") and visible and all namespaces.

 I think /share/$USER is an extreme example. You can found more
 situations when private namespaces are nice solution.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

[parent not found: <20070413200720.GS31445-CxBs/XhZ2BtHjqfyn1fVYA@public.gmane.org>]

* Re: [patch 0/8] unprivileged mount syscall
       [not found]                                 ` <20070413200720.GS31445-CxBs/XhZ2BtHjqfyn1fVYA@public.gmane.org>
@ 2007-04-15 20:21                                   ` Miklos Szeredi
  0 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-15 20:21 UTC (permalink / raw)
  To: kzak-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linuxram-r/Jw6+rmf7HQT0dZR+AlfA, serue-r/Jw6+rmf7HQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-qjLDD68F18O7TbgM5vRIOg,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

> > Thinking a bit more about this, I'm quite sure most users wouldn't
> > even want private namespaces.  It would be enough to
> > 
> >   chroot /share/$USER
> > 
> > and be done with it.
> 
>  I don't think so. How to you want to implement non-shared /tmp
>  directories?

  mount --bind /.tmp/$USER /share/$USER/tmp

or whatever else this polyunsaturated thingy does within the cloned
namespace.

> The chroot is overkill in this case.

What do you mean it's an overkill?  clone(CLONE_NS) duplicates all the
mounts, just as mount --rbind does.

> > Private namespaces are only good for keeping a bunch of mounts
> > referenced by a group of processes.  But my guess is, that the natural
> > behavior for users is to see a persistent set of mounts.
> > 
> > If for example they mount something on a remote machine, then log out
> > from the ssh session and later log back in, they would want to see
> > their previous mount still there.
> 
>  They can mount to /mnt where the directory is shared ("mount
>  --make-shared /mnt") and visible and all namespaces.
> 
>  I think /share/$USER is an extreme example. You can found more
>  situations when private namespaces are nice solution.

Private to a single login session?  I'd like to hear examples.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
       [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
  2007-04-06 23:02   ` Andrew Morton
@ 2007-04-09 22:00   ` Serge E. Hallyn
  2007-04-11 10:32     ` Miklos Szeredi
  1 sibling, 1 reply; 54+ messages in thread
From: Serge E. Hallyn @ 2007-04-09 22:00 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	util-linux-ng-u79uwXL29TY76Z2rM5mHXA

Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org):
> This patchset adds support for keeping mount ownership information in
> the kernel, and allow unprivileged mount(2) and umount(2) in certain
> cases.

Well, I'd like to feel all smart and point out some bugs, but the code
all reads very nicely, seems to work as advertised, and while I won't
have ltp results until tomorrow, boot test results in so far are all
successful.

Looks good.

-serge

> This can be useful for the following reasons:
> 
> - mount(8) can store ownership ("user=XY" option) in the kernel
>   instead, or in addition to storing it in /etc/mtab.  For example if
>   private namespaces are used with mount propagations /etc/mtab
>   becomes unworkable, but using /proc/mounts works fine
> 
> - fuse won't need a special suid-root mount/umount utility.  Plain
>   umount(8) can easily be made to work with unprivileged fuse mounts
> 
> - users can use bind mounts without having to pre-configure them in
>   /etc/fstab
> 
> All this is done in a secure way, and unprivileged bind and fuse
> mounts are disabled by default and can be enabled through sysctl or
> /proc/sys.
> 
> One thing that is missing from this series is the ability to restrict
> user mounts to private namespaces.  The reason is that private
> namespaces have still not gained the momentum and support needed for
> painless user experience.  So such a feature would not yet get enough
> attention and testing.  However adding such an optional restriction
> can be done with minimal changes in the future, once private
> namespaces have matured.
> 
> An earlier version of these patches have been discussed here:
> 
>   http://lkml.org/lkml/2005/5/3/64
> 
> --
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch 0/8] unprivileged mount syscall
  2007-04-09 22:00   ` Serge E. Hallyn
@ 2007-04-11 10:32     ` Miklos Szeredi
  0 siblings, 0 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-11 10:32 UTC (permalink / raw)
  To: serue; +Cc: akpm, linux-fsdevel, util-linux-ng

> > This patchset adds support for keeping mount ownership information in
> > the kernel, and allow unprivileged mount(2) and umount(2) in certain
> > cases.
> 
> Well, I'd like to feel all smart and point out some bugs, but the code
> all reads very nicely, seems to work as advertised, and while I won't
> have ltp results until tomorrow, boot test results in so far are all
> successful.
> 
> Looks good.

Thanks for the review and testing!

Miklos

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2007-04-16 15:55 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
2007-04-04 18:30 ` [patch 1/8] add user mounts to the kernel Miklos Szeredi
2007-04-04 18:30 ` [patch 2/8] allow unprivileged umount Miklos Szeredi
2007-04-04 18:30 ` [patch 3/8] account user mounts Miklos Szeredi
2007-04-04 18:30 ` [patch 4/8] propagate error values from clone_mnt Miklos Szeredi
2007-04-04 18:30 ` [patch 5/8] allow unprivileged bind mounts Miklos Szeredi
2007-04-04 18:30 ` [patch 6/8] put declaration of put_filesystem() in fs.h Miklos Szeredi
2007-04-04 18:30 ` [patch 7/8] allow unprivileged mounts Miklos Szeredi
2007-04-04 18:30 ` [patch 8/8] allow unprivileged fuse mounts Miklos Szeredi
2007-04-09 18:57 ` [patch 0/8] unprivileged mount syscall Serge E. Hallyn
2007-04-09 20:14   ` Miklos Szeredi
2007-04-09 20:55     ` Serge E. Hallyn
     [not found]       ` <20070409205506.GC20226-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-04-11 19:43         ` Miklos Szeredi
     [not found]           ` <E1Hbiih-00060L-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-11 20:05             ` Serge E. Hallyn
2007-04-11 20:41               ` Miklos Szeredi
2007-04-11 20:57                 ` Serge E. Hallyn
     [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
2007-04-06 23:02   ` Andrew Morton
2007-04-06 23:16     ` H. Peter Anvin
2007-04-06 23:55       ` Jan Engelhardt
2007-04-07  0:22         ` H. Peter Anvin
2007-04-07  3:40           ` Eric Van Hensbergen
     [not found]             ` <a4e6962a0704062040q12c0013ek9591b9fbb27caa12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-04-07  6:48               ` Miklos Szeredi
2007-04-10  8:52       ` Ian Kent
     [not found]         ` <1176195125.3476.47.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
2007-04-11 10:48           ` Miklos Szeredi
2007-04-11 13:48             ` Ian Kent
     [not found]               ` <1176299311.3377.6.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
2007-04-11 14:26                 ` Serge E. Hallyn
     [not found]                   ` <20070411142608.GC30460-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-04-11 14:27                     ` Ian Kent
     [not found]                       ` <1176301632.3377.9.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
2007-04-11 14:45                         ` Serge E. Hallyn
2007-04-07  6:41     ` Miklos Szeredi
     [not found]       ` <E1Ha4cN-0004rc-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-09 14:38         ` Serge E. Hallyn
     [not found]           ` <20070409143802.GB4891-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-04-09 16:24             ` Miklos Szeredi
2007-04-09 17:07               ` Serge E. Hallyn
2007-04-09 17:46                 ` Ram Pai
2007-04-09 18:25                   ` H. Peter Anvin
2007-04-10 10:33                   ` Karel Zak
2007-04-09 20:10                 ` Miklos Szeredi
2007-04-10  8:38                   ` Ram Pai
2007-04-11 10:44                     ` Miklos Szeredi
     [not found]                       ` <E1HbaJV-00059N-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-11 18:28                         ` Ram Pai
     [not found]                           ` <1176316116.2811.39.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>
2007-04-13 11:58                             ` Miklos Szeredi
     [not found]                               ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-13 13:28                                 ` Serge E. Hallyn
2007-04-13 14:05                                   ` Miklos Szeredi
2007-04-13 21:44                                     ` Serge E. Hallyn
     [not found]                                       ` <20070413214415.GA28629-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-04-15 20:39                                         ` Miklos Szeredi
     [not found]                                           ` <E1HdBVc-0005pL-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-16  1:11                                             ` Serge E. Hallyn
     [not found]                                     ` <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-16  8:18                                       ` Ram Pai
     [not found]                                         ` <1176711509.9488.4.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>
2007-04-16  9:27                                           ` Miklos Szeredi
2007-04-16 15:40                                             ` Eric W. Biederman
     [not found]                                               ` <m1d524l43w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-04-16 15:55                                                 ` Miklos Szeredi
2007-04-16  7:59                                 ` Ram Pai
2007-04-13 20:07                               ` Karel Zak
     [not found]                                 ` <20070413200720.GS31445-CxBs/XhZ2BtHjqfyn1fVYA@public.gmane.org>
2007-04-15 20:21                                   ` Miklos Szeredi
2007-04-09 22:00   ` Serge E. Hallyn
2007-04-11 10:32     ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox