* [patch 0/8] unprivileged mount syscall
@ 2007-04-04 18:30 Miklos Szeredi
2007-04-04 18:30 ` [patch 1/8] add user mounts to the kernel Miklos Szeredi
` (9 more replies)
0 siblings, 10 replies; 54+ messages in thread
From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw)
To: akpm; +Cc: linux-fsdevel, util-linux-ng
This patchset adds support for keeping mount ownership information in
the kernel, and allow unprivileged mount(2) and umount(2) in certain
cases.
This can be useful for the following reasons:
- mount(8) can store ownership ("user=XY" option) in the kernel
instead, or in addition to storing it in /etc/mtab. For example if
private namespaces are used with mount propagations /etc/mtab
becomes unworkable, but using /proc/mounts works fine
- fuse won't need a special suid-root mount/umount utility. Plain
umount(8) can easily be made to work with unprivileged fuse mounts
- users can use bind mounts without having to pre-configure them in
/etc/fstab
All this is done in a secure way, and unprivileged bind and fuse
mounts are disabled by default and can be enabled through sysctl or
/proc/sys.
One thing that is missing from this series is the ability to restrict
user mounts to private namespaces. The reason is that private
namespaces have still not gained the momentum and support needed for
painless user experience. So such a feature would not yet get enough
attention and testing. However adding such an optional restriction
can be done with minimal changes in the future, once private
namespaces have matured.
An earlier version of these patches have been discussed here:
http://lkml.org/lkml/2005/5/3/64
--
^ permalink raw reply [flat|nested] 54+ messages in thread* [patch 1/8] add user mounts to the kernel 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi @ 2007-04-04 18:30 ` Miklos Szeredi 2007-04-04 18:30 ` [patch 2/8] allow unprivileged umount Miklos Szeredi ` (8 subsequent siblings) 9 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng [-- Attachment #1: mount_owner.patch --] [-- Type: text/plain, Size: 6257 bytes --] From: Miklos Szeredi <mszeredi@suse.cz> Add ownership information to mounts. A new mount flag, MS_SETUSER is used to make a mount owned by a user. If this flag is specified, then the owner will be set to the current real user id and the mount will be maked with the MNT_USER flag. On remount don't preserve previous onwner, and treat MS_SETUSER as for a new mount. The MS_SETUSER flag is ignored on mount move. The MNT_USER flag is not copied on any kind of mount cloning: namespace creation, binding or propagation. For bind mounts the cloned mount(s) are set to MNT_USER depending on the MS_SETUSER mount flag. In all the other cases MNT_USER is always cleared. For MNT_USER mounts a "user=UID" option is added to /proc/PID/mounts. This is compatible with how mount ownership is stored in /etc/mtab. It is expected, that in the future mount(8) will use MS_SETUSER to store mount ownership within the kernel. This would help in situations, where /etc/mtab is difficult or impossible to work with, e.g. when using mount propagation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2007-04-04 19:29:41.000000000 +0200 +++ linux/fs/namespace.c 2007-04-04 19:29:47.000000000 +0200 @@ -227,6 +227,13 @@ static struct vfsmount *skip_mnt_tree(st return p; } +static void set_mnt_user(struct vfsmount *mnt) +{ + BUG_ON(mnt->mnt_flags & MNT_USER); + mnt->mnt_uid = current->uid; + mnt->mnt_flags |= MNT_USER; +} + static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root, int flag) { @@ -241,6 +248,11 @@ static struct vfsmount *clone_mnt(struct mnt->mnt_mountpoint = mnt->mnt_root; mnt->mnt_parent = mnt; + /* don't copy the MNT_USER flag */ + mnt->mnt_flags &= ~MNT_USER; + if (flag & CL_SETUSER) + set_mnt_user(mnt); + if (flag & CL_SLAVE) { list_add(&mnt->mnt_slave, &old->mnt_slave_list); mnt->mnt_master = old; @@ -390,6 +402,8 @@ static int show_vfsmnt(struct seq_file * if (mnt->mnt_flags & fs_infop->flag) seq_puts(m, fs_infop->str); } + if (mnt->mnt_flags & MNT_USER) + seq_printf(m, ",user=%i", mnt->mnt_uid); if (mnt->mnt_sb->s_op->show_options) err = mnt->mnt_sb->s_op->show_options(m, mnt); seq_puts(m, " 0 0\n"); @@ -901,8 +915,9 @@ static int do_change_type(struct nameida /* * do loopback mount. */ -static int do_loopback(struct nameidata *nd, char *old_name, int recurse) +static int do_loopback(struct nameidata *nd, char *old_name, int flags) { + int clone_flags; struct nameidata old_nd; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); @@ -922,11 +937,12 @@ static int do_loopback(struct nameidata if (!check_mnt(nd->mnt) || !check_mnt(old_nd.mnt)) goto out; + clone_flags = (flags & MS_SETUSER) ? CL_SETUSER : 0; err = -ENOMEM; - if (recurse) - mnt = copy_tree(old_nd.mnt, old_nd.dentry, 0); + if (flags & MS_REC) + mnt = copy_tree(old_nd.mnt, old_nd.dentry, clone_flags); else - mnt = clone_mnt(old_nd.mnt, old_nd.dentry, 0); + mnt = clone_mnt(old_nd.mnt, old_nd.dentry, clone_flags); if (!mnt) goto out; @@ -968,8 +984,11 @@ static int do_remount(struct nameidata * down_write(&sb->s_umount); err = do_remount_sb(sb, flags, data, 0); - if (!err) + if (!err) { nd->mnt->mnt_flags = mnt_flags; + if (flags & MS_SETUSER) + set_mnt_user(nd->mnt); + } up_write(&sb->s_umount); if (!err) security_sb_post_remount(nd->mnt, flags, data); @@ -1074,10 +1093,13 @@ static int do_new_mount(struct nameidata if (!capable(CAP_SYS_ADMIN)) return -EPERM; - mnt = do_kern_mount(type, flags, name, data); + mnt = do_kern_mount(type, flags & ~MS_SETUSER, name, data); if (IS_ERR(mnt)) return PTR_ERR(mnt); + if (flags & MS_SETUSER) + set_mnt_user(mnt); + return do_add_mount(mnt, nd, mnt_flags, NULL); } @@ -1108,7 +1130,8 @@ int do_add_mount(struct vfsmount *newmnt if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode)) goto unlock; - newmnt->mnt_flags = mnt_flags; + /* MNT_USER was set earlier */ + newmnt->mnt_flags |= mnt_flags; if ((err = graft_tree(newmnt, nd))) goto unlock; @@ -1428,7 +1451,7 @@ long do_mount(char *dev_name, char *dir_ retval = do_remount(&nd, flags & ~MS_REMOUNT, mnt_flags, data_page); else if (flags & MS_BIND) - retval = do_loopback(&nd, dev_name, flags & MS_REC); + retval = do_loopback(&nd, dev_name, flags); else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE)) retval = do_change_type(&nd, flags); else if (flags & MS_MOVE) Index: linux/include/linux/fs.h =================================================================== --- linux.orig/include/linux/fs.h 2007-04-04 19:29:41.000000000 +0200 +++ linux/include/linux/fs.h 2007-04-04 19:29:47.000000000 +0200 @@ -122,6 +122,7 @@ extern int dir_notify_enable; #define MS_SLAVE (1<<19) /* change to slave */ #define MS_SHARED (1<<20) /* change to shared */ #define MS_RELATIME (1<<21) /* Update atime relative to mtime/ctime. */ +#define MS_SETUSER (1<<22) /* set mnt_uid to current user */ #define MS_ACTIVE (1<<30) #define MS_NOUSER (1<<31) Index: linux/include/linux/mount.h =================================================================== --- linux.orig/include/linux/mount.h 2007-04-04 19:27:47.000000000 +0200 +++ linux/include/linux/mount.h 2007-04-04 19:29:47.000000000 +0200 @@ -28,6 +28,7 @@ struct mnt_namespace; #define MNT_NOATIME 0x08 #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 +#define MNT_USER 0x40 #define MNT_SHRINKABLE 0x100 @@ -61,6 +62,8 @@ struct vfsmount { atomic_t mnt_count; int mnt_expiry_mark; /* true if marked for expiry */ int mnt_pinned; + + uid_t mnt_uid; /* owner of the mount */ }; static inline struct vfsmount *mntget(struct vfsmount *mnt) Index: linux/fs/pnode.h =================================================================== --- linux.orig/fs/pnode.h 2007-04-04 19:27:47.000000000 +0200 +++ linux/fs/pnode.h 2007-04-04 19:29:47.000000000 +0200 @@ -22,6 +22,7 @@ #define CL_COPY_ALL 0x04 #define CL_MAKE_SHARED 0x08 #define CL_PROPAGATION 0x10 +#define CL_SETUSER 0x20 static inline void set_mnt_shared(struct vfsmount *mnt) { -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* [patch 2/8] allow unprivileged umount 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi 2007-04-04 18:30 ` [patch 1/8] add user mounts to the kernel Miklos Szeredi @ 2007-04-04 18:30 ` Miklos Szeredi 2007-04-04 18:30 ` [patch 3/8] account user mounts Miklos Szeredi ` (7 subsequent siblings) 9 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng [-- Attachment #1: unprivileged_umount.patch --] [-- Type: text/plain, Size: 1373 bytes --] From: Miklos Szeredi <mszeredi@suse.cz> The owner doesn't need sysadmin capabilities to call umount(). Similar behavior as umount(8) on mounts having "user=UID" option in /etc/mtab. The difference is that umount also checks /etc/fstab, presumably to exclude another mount on the same mountpoint. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2007-04-04 19:29:47.000000000 +0200 +++ linux/fs/namespace.c 2007-04-04 19:29:54.000000000 +0200 @@ -640,6 +640,25 @@ static int do_umount(struct vfsmount *mn } /* + * umount is permitted for + * - sysadmin + * - mount owner, if not forced umount + */ +static bool permit_umount(struct vfsmount *mnt, int flags) +{ + if (capable(CAP_SYS_ADMIN)) + return true; + + if (!(mnt->mnt_flags & MNT_USER)) + return false; + + if (flags & MNT_FORCE) + return false; + + return mnt->mnt_uid == current->uid; +} + +/* * Now umount can handle mount points as well as block devices. * This is important for filesystems which use unnamed block devices. * @@ -662,7 +681,7 @@ asmlinkage long sys_umount(char __user * goto dput_and_out; retval = -EPERM; - if (!capable(CAP_SYS_ADMIN)) + if (!permit_umount(nd.mnt, flags)) goto dput_and_out; retval = do_umount(nd.mnt, flags); -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* [patch 3/8] account user mounts 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi 2007-04-04 18:30 ` [patch 1/8] add user mounts to the kernel Miklos Szeredi 2007-04-04 18:30 ` [patch 2/8] allow unprivileged umount Miklos Szeredi @ 2007-04-04 18:30 ` Miklos Szeredi 2007-04-04 18:30 ` [patch 4/8] propagate error values from clone_mnt Miklos Szeredi ` (6 subsequent siblings) 9 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng [-- Attachment #1: account_user_mounts.patch --] [-- Type: text/plain, Size: 4851 bytes --] From: Miklos Szeredi <mszeredi@suse.cz> Add sysctl variables for accounting and limiting the number of user mounts. The maximum number of user mounts is set to zero by default. This matches the behavior of previous kernels. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/include/linux/sysctl.h =================================================================== --- linux.orig/include/linux/sysctl.h 2007-04-04 19:28:03.000000000 +0200 +++ linux/include/linux/sysctl.h 2007-04-04 19:29:57.000000000 +0200 @@ -813,6 +813,8 @@ enum FS_AIO_NR=18, /* current system-wide number of aio requests */ FS_AIO_MAX_NR=19, /* system-wide maximum number of aio requests */ FS_INOTIFY=20, /* inotify submenu */ + FS_NR_USER_MOUNTS=21, /* int:current number of user mounts */ + FS_MAX_USER_MOUNTS=22, /* int:maximum number of user mounts */ FS_OCFS2=988, /* ocfs2 */ }; Index: linux/kernel/sysctl.c =================================================================== --- linux.orig/kernel/sysctl.c 2007-04-04 19:28:03.000000000 +0200 +++ linux/kernel/sysctl.c 2007-04-04 19:29:57.000000000 +0200 @@ -984,6 +984,22 @@ static ctl_table fs_table[] = { #endif #endif { + .ctl_name = FS_NR_USER_MOUNTS, + .procname = "nr_user_mounts", + .data = &nr_user_mounts, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = FS_MAX_USER_MOUNTS, + .procname = "max_user_mounts", + .data = &max_user_mounts, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { .ctl_name = KERN_SETUID_DUMPABLE, .procname = "suid_dumpable", .data = &suid_dumpable, Index: linux/Documentation/filesystems/proc.txt =================================================================== --- linux.orig/Documentation/filesystems/proc.txt 2007-04-04 19:27:59.000000000 +0200 +++ linux/Documentation/filesystems/proc.txt 2007-04-04 19:29:57.000000000 +0200 @@ -922,6 +922,18 @@ reaches aio-max-nr then io_setup will fa raising aio-max-nr does not result in the pre-allocation or re-sizing of any kernel data structures. +nr_user_mounts and max_user_mounts +---------------------------------- + +These represent the number of "user" mounts and the maximum number of +"user" mounts respectively. User mounts may be created by +unprivileged users. User mounts may also be created with sysadmin +privileges on behalf of a user, in which case nr_user_mounts may +exceed max_user_mounts. + +By default max_user_mounts is zero. If you wish to enable +unprivileged mounts, set it to to some sane value, (e.g. 1000). + 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats ----------------------------------------------------------- Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2007-04-04 19:29:54.000000000 +0200 +++ linux/fs/namespace.c 2007-04-04 19:29:57.000000000 +0200 @@ -39,6 +39,9 @@ static int hash_mask __read_mostly, hash static struct kmem_cache *mnt_cache __read_mostly; static struct rw_semaphore namespace_sem; +int nr_user_mounts; +int max_user_mounts; + /* /sys/fs */ decl_subsys(fs, NULL, NULL); EXPORT_SYMBOL_GPL(fs_subsys); @@ -227,11 +230,30 @@ static struct vfsmount *skip_mnt_tree(st return p; } +static void dec_nr_user_mounts(void) +{ + spin_lock(&vfsmount_lock); + nr_user_mounts--; + spin_unlock(&vfsmount_lock); +} + static void set_mnt_user(struct vfsmount *mnt) { BUG_ON(mnt->mnt_flags & MNT_USER); mnt->mnt_uid = current->uid; mnt->mnt_flags |= MNT_USER; + spin_lock(&vfsmount_lock); + nr_user_mounts++; + spin_unlock(&vfsmount_lock); +} + +static void clear_mnt_user(struct vfsmount *mnt) +{ + if (mnt->mnt_flags & MNT_USER) { + mnt->mnt_uid = 0; + mnt->mnt_flags &= ~MNT_USER; + dec_nr_user_mounts(); + } } static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root, @@ -283,6 +305,7 @@ static inline void __mntput(struct vfsmo { struct super_block *sb = mnt->mnt_sb; dput(mnt->mnt_root); + clear_mnt_user(mnt); free_vfsmnt(mnt); deactivate_super(sb); } @@ -1004,6 +1027,7 @@ static int do_remount(struct nameidata * down_write(&sb->s_umount); err = do_remount_sb(sb, flags, data, 0); if (!err) { + clear_mnt_user(nd->mnt); nd->mnt->mnt_flags = mnt_flags; if (flags & MS_SETUSER) set_mnt_user(nd->mnt); Index: linux/include/linux/fs.h =================================================================== --- linux.orig/include/linux/fs.h 2007-04-04 19:29:47.000000000 +0200 +++ linux/include/linux/fs.h 2007-04-04 19:29:57.000000000 +0200 @@ -49,6 +49,9 @@ extern struct inodes_stat_t inodes_stat; extern int leases_enable, lease_break_time; +extern int nr_user_mounts; +extern int max_user_mounts; + #ifdef CONFIG_DNOTIFY extern int dir_notify_enable; #endif -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* [patch 4/8] propagate error values from clone_mnt 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi ` (2 preceding siblings ...) 2007-04-04 18:30 ` [patch 3/8] account user mounts Miklos Szeredi @ 2007-04-04 18:30 ` Miklos Szeredi 2007-04-04 18:30 ` [patch 5/8] allow unprivileged bind mounts Miklos Szeredi ` (5 subsequent siblings) 9 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng [-- Attachment #1: clone_return_errno.patch --] [-- Type: text/plain, Size: 5150 bytes --] From: Miklos Szeredi <mszeredi@suse.cz> Allow clone_mnt() to return errors other than ENOMEM. This will be used for returning a different error value when the number of user mounts goes over the limit. Fix copy_tree() to return EPERM for unbindable mounts. Don't propagate further from dup_mnt_ns() as that copy_tree() can only fail with -ENOMEM. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2007-04-04 19:29:57.000000000 +0200 +++ linux/fs/namespace.c 2007-04-04 19:30:00.000000000 +0200 @@ -261,42 +261,42 @@ static struct vfsmount *clone_mnt(struct { struct super_block *sb = old->mnt_sb; struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname); + if (!mnt) + return ERR_PTR(-ENOMEM); - if (mnt) { - mnt->mnt_flags = old->mnt_flags; - atomic_inc(&sb->s_active); - mnt->mnt_sb = sb; - mnt->mnt_root = dget(root); - mnt->mnt_mountpoint = mnt->mnt_root; - mnt->mnt_parent = mnt; - - /* don't copy the MNT_USER flag */ - mnt->mnt_flags &= ~MNT_USER; - if (flag & CL_SETUSER) - set_mnt_user(mnt); - - if (flag & CL_SLAVE) { - list_add(&mnt->mnt_slave, &old->mnt_slave_list); - mnt->mnt_master = old; - CLEAR_MNT_SHARED(mnt); - } else { - if ((flag & CL_PROPAGATION) || IS_MNT_SHARED(old)) - list_add(&mnt->mnt_share, &old->mnt_share); - if (IS_MNT_SLAVE(old)) - list_add(&mnt->mnt_slave, &old->mnt_slave); - mnt->mnt_master = old->mnt_master; - } - if (flag & CL_MAKE_SHARED) - set_mnt_shared(mnt); + mnt->mnt_flags = old->mnt_flags; + atomic_inc(&sb->s_active); + mnt->mnt_sb = sb; + mnt->mnt_root = dget(root); + mnt->mnt_mountpoint = mnt->mnt_root; + mnt->mnt_parent = mnt; + + /* don't copy the MNT_USER flag */ + mnt->mnt_flags &= ~MNT_USER; + if (flag & CL_SETUSER) + set_mnt_user(mnt); - /* stick the duplicate mount on the same expiry list - * as the original if that was on one */ - if (flag & CL_EXPIRE) { - spin_lock(&vfsmount_lock); - if (!list_empty(&old->mnt_expire)) - list_add(&mnt->mnt_expire, &old->mnt_expire); - spin_unlock(&vfsmount_lock); - } + if (flag & CL_SLAVE) { + list_add(&mnt->mnt_slave, &old->mnt_slave_list); + mnt->mnt_master = old; + CLEAR_MNT_SHARED(mnt); + } else { + if ((flag & CL_PROPAGATION) || IS_MNT_SHARED(old)) + list_add(&mnt->mnt_share, &old->mnt_share); + if (IS_MNT_SLAVE(old)) + list_add(&mnt->mnt_slave, &old->mnt_slave); + mnt->mnt_master = old->mnt_master; + } + if (flag & CL_MAKE_SHARED) + set_mnt_shared(mnt); + + /* stick the duplicate mount on the same expiry list + * as the original if that was on one */ + if (flag & CL_EXPIRE) { + spin_lock(&vfsmount_lock); + if (!list_empty(&old->mnt_expire)) + list_add(&mnt->mnt_expire, &old->mnt_expire); + spin_unlock(&vfsmount_lock); } return mnt; } @@ -762,11 +762,11 @@ struct vfsmount *copy_tree(struct vfsmou struct nameidata nd; if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt)) - return NULL; + return ERR_PTR(-EPERM); res = q = clone_mnt(mnt, dentry, flag); - if (!q) - goto Enomem; + if (IS_ERR(q)) + goto error; q->mnt_mountpoint = mnt->mnt_mountpoint; p = mnt; @@ -787,8 +787,8 @@ struct vfsmount *copy_tree(struct vfsmou nd.mnt = q; nd.dentry = p->mnt_mountpoint; q = clone_mnt(p, p->mnt_root, flag); - if (!q) - goto Enomem; + if (IS_ERR(q)) + goto error; spin_lock(&vfsmount_lock); list_add_tail(&q->mnt_list, &res->mnt_list); attach_mnt(q, &nd); @@ -796,7 +796,7 @@ struct vfsmount *copy_tree(struct vfsmou } } return res; -Enomem: + error: if (res) { LIST_HEAD(umount_list); spin_lock(&vfsmount_lock); @@ -804,7 +804,7 @@ Enomem: spin_unlock(&vfsmount_lock); release_mounts(&umount_list); } - return NULL; + return q; } /* @@ -980,13 +980,13 @@ static int do_loopback(struct nameidata goto out; clone_flags = (flags & MS_SETUSER) ? CL_SETUSER : 0; - err = -ENOMEM; if (flags & MS_REC) mnt = copy_tree(old_nd.mnt, old_nd.dentry, clone_flags); else mnt = clone_mnt(old_nd.mnt, old_nd.dentry, clone_flags); - if (!mnt) + err = PTR_ERR(mnt); + if (IS_ERR(mnt)) goto out; err = graft_tree(mnt, nd); @@ -1532,7 +1532,7 @@ struct mnt_namespace *dup_mnt_ns(struct /* First pass: copy the tree topology */ new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root, CL_COPY_ALL | CL_EXPIRE); - if (!new_ns->root) { + if (IS_ERR(new_ns->root)) { up_write(&namespace_sem); kfree(new_ns); return NULL; Index: linux/fs/pnode.c =================================================================== --- linux.orig/fs/pnode.c 2007-04-04 19:27:47.000000000 +0200 +++ linux/fs/pnode.c 2007-04-04 19:30:00.000000000 +0200 @@ -187,8 +187,9 @@ int propagate_mnt(struct vfsmount *dest_ source = get_source(m, prev_dest_mnt, prev_src_mnt, &type); - if (!(child = copy_tree(source, source->mnt_root, type))) { - ret = -ENOMEM; + child = copy_tree(source, source->mnt_root, type); + if (IS_ERR(child)) { + ret = PTR_ERR(child); list_splice(tree_list, tmp_list.prev); goto out; } -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* [patch 5/8] allow unprivileged bind mounts 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi ` (3 preceding siblings ...) 2007-04-04 18:30 ` [patch 4/8] propagate error values from clone_mnt Miklos Szeredi @ 2007-04-04 18:30 ` Miklos Szeredi 2007-04-04 18:30 ` [patch 6/8] put declaration of put_filesystem() in fs.h Miklos Szeredi ` (4 subsequent siblings) 9 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng [-- Attachment #1: unprivileged_bind_mount.patch --] [-- Type: text/plain, Size: 3610 bytes --] From: Miklos Szeredi <mszeredi@suse.cz> Allow bind mounts to unprivileged users if the following conditions are met: - mountpoint is not a symlink or special file - mountpoint is not a sticky directory or is owned by the current user - mountpoint is writable by user - the number of user mounts is below the maximum Unprivileged mounts imply MS_SETUSER, and will also have the "nosuid" and "nodev" mount flags set. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2007-04-04 19:30:00.000000000 +0200 +++ linux/fs/namespace.c 2007-04-04 19:30:02.000000000 +0200 @@ -237,11 +237,30 @@ static void dec_nr_user_mounts(void) spin_unlock(&vfsmount_lock); } -static void set_mnt_user(struct vfsmount *mnt) +static int reserve_user_mount(void) +{ + int err = 0; + spin_lock(&vfsmount_lock); + if (nr_user_mounts >= max_user_mounts && !capable(CAP_SYS_ADMIN)) + err = -EPERM; + else + nr_user_mounts++; + spin_unlock(&vfsmount_lock); + return err; +} + +static void __set_mnt_user(struct vfsmount *mnt) { BUG_ON(mnt->mnt_flags & MNT_USER); mnt->mnt_uid = current->uid; mnt->mnt_flags |= MNT_USER; + if (!capable(CAP_SYS_ADMIN)) + mnt->mnt_flags |= MNT_NOSUID | MNT_NODEV; +} + +static void set_mnt_user(struct vfsmount *mnt) +{ + __set_mnt_user(mnt); spin_lock(&vfsmount_lock); nr_user_mounts++; spin_unlock(&vfsmount_lock); @@ -260,9 +279,16 @@ static struct vfsmount *clone_mnt(struct int flag) { struct super_block *sb = old->mnt_sb; - struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname); + struct vfsmount *mnt; + + if (flag & CL_SETUSER) { + int err = reserve_user_mount(); + if (err) + return ERR_PTR(err); + } + mnt = alloc_vfsmnt(old->mnt_devname); if (!mnt) - return ERR_PTR(-ENOMEM); + goto alloc_failed; mnt->mnt_flags = old->mnt_flags; atomic_inc(&sb->s_active); @@ -274,7 +300,7 @@ static struct vfsmount *clone_mnt(struct /* don't copy the MNT_USER flag */ mnt->mnt_flags &= ~MNT_USER; if (flag & CL_SETUSER) - set_mnt_user(mnt); + __set_mnt_user(mnt); if (flag & CL_SLAVE) { list_add(&mnt->mnt_slave, &old->mnt_slave_list); @@ -299,6 +325,11 @@ static struct vfsmount *clone_mnt(struct spin_unlock(&vfsmount_lock); } return mnt; + + alloc_failed: + if (flag & CL_SETUSER) + dec_nr_user_mounts(); + return ERR_PTR(-ENOMEM); } static inline void __mntput(struct vfsmount *mnt) @@ -726,22 +757,23 @@ asmlinkage long sys_oldumount(char __use #endif -static int mount_is_safe(struct nameidata *nd) +static int mount_is_safe(struct nameidata *nd, int *flags) { if (capable(CAP_SYS_ADMIN)) return 0; - return -EPERM; -#ifdef notyet - if (S_ISLNK(nd->dentry->d_inode->i_mode)) + + if (!S_ISDIR(nd->dentry->d_inode->i_mode) && + !S_ISREG(nd->dentry->d_inode->i_mode)) return -EPERM; if (nd->dentry->d_inode->i_mode & S_ISVTX) { - if (current->uid != nd->dentry->d_inode->i_uid) + if (current->fsuid != nd->dentry->d_inode->i_uid) return -EPERM; } - if (vfs_permission(nd, MAY_WRITE)) + if (vfs_permission(nd, MAY_WRITE) || IS_APPEND(nd->dentry->d_inode)) return -EPERM; + + *flags |= MS_SETUSER; return 0; -#endif } static int lives_below_in_same_fs(struct dentry *d, struct dentry *dentry) @@ -962,7 +994,7 @@ static int do_loopback(struct nameidata int clone_flags; struct nameidata old_nd; struct vfsmount *mnt = NULL; - int err = mount_is_safe(nd); + int err = mount_is_safe(nd, &flags); if (err) return err; if (!old_name || !*old_name) -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* [patch 6/8] put declaration of put_filesystem() in fs.h 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi ` (4 preceding siblings ...) 2007-04-04 18:30 ` [patch 5/8] allow unprivileged bind mounts Miklos Szeredi @ 2007-04-04 18:30 ` Miklos Szeredi 2007-04-04 18:30 ` [patch 7/8] allow unprivileged mounts Miklos Szeredi ` (3 subsequent siblings) 9 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng [-- Attachment #1: put_filesystem_in_header.patch --] [-- Type: text/plain, Size: 1282 bytes --] From: Miklos Szeredi <mszeredi@suse.cz> Declarations go into headers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/fs/super.c =================================================================== --- linux.orig/fs/super.c 2007-04-04 19:29:41.000000000 +0200 +++ linux/fs/super.c 2007-04-04 19:30:05.000000000 +0200 @@ -40,10 +40,6 @@ #include <asm/uaccess.h> -void get_filesystem(struct file_system_type *fs); -void put_filesystem(struct file_system_type *fs); -struct file_system_type *get_fs_type(const char *name); - LIST_HEAD(super_blocks); DEFINE_SPINLOCK(sb_lock); Index: linux/include/linux/fs.h =================================================================== --- linux.orig/include/linux/fs.h 2007-04-04 19:29:57.000000000 +0200 +++ linux/include/linux/fs.h 2007-04-04 19:30:05.000000000 +0200 @@ -1858,6 +1858,8 @@ extern int vfs_fstat(unsigned int, struc extern int vfs_ioctl(struct file *, unsigned int, unsigned int, unsigned long); +extern void get_filesystem(struct file_system_type *fs); +extern void put_filesystem(struct file_system_type *fs); extern struct file_system_type *get_fs_type(const char *name); extern struct super_block *get_super(struct block_device *); extern struct super_block *user_get_super(dev_t); -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* [patch 7/8] allow unprivileged mounts 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi ` (5 preceding siblings ...) 2007-04-04 18:30 ` [patch 6/8] put declaration of put_filesystem() in fs.h Miklos Szeredi @ 2007-04-04 18:30 ` Miklos Szeredi 2007-04-04 18:30 ` [patch 8/8] allow unprivileged fuse mounts Miklos Szeredi ` (2 subsequent siblings) 9 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng [-- Attachment #1: unprivileged_mount.patch --] [-- Type: text/plain, Size: 3519 bytes --] From: Miklos Szeredi <mszeredi@suse.cz> Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind, a thorough audit is needed before setting this flag. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2007-04-04 19:30:02.000000000 +0200 +++ linux/fs/namespace.c 2007-04-04 19:30:08.000000000 +0200 @@ -757,11 +757,15 @@ asmlinkage long sys_oldumount(char __use #endif -static int mount_is_safe(struct nameidata *nd, int *flags) +static int mount_is_safe(struct nameidata *nd, struct file_system_type *type, + int *flags) { if (capable(CAP_SYS_ADMIN)) return 0; + if (type && !(type->fs_flags & FS_SAFE)) + return -EPERM; + if (!S_ISDIR(nd->dentry->d_inode->i_mode) && !S_ISREG(nd->dentry->d_inode->i_mode)) return -EPERM; @@ -994,7 +998,7 @@ static int do_loopback(struct nameidata int clone_flags; struct nameidata old_nd; struct vfsmount *mnt = NULL; - int err = mount_is_safe(nd, &flags); + int err = mount_is_safe(nd, NULL, &flags); if (err) return err; if (!old_name || !*old_name) @@ -1156,26 +1160,46 @@ out: * create a new mount for userspace and request it to be added into the * namespace's tree */ -static int do_new_mount(struct nameidata *nd, char *type, int flags, +static int do_new_mount(struct nameidata *nd, char *fstype, int flags, int mnt_flags, char *name, void *data) { + int err; struct vfsmount *mnt; + struct file_system_type *type; - if (!type || !memchr(type, 0, PAGE_SIZE)) + if (!fstype || !memchr(fstype, 0, PAGE_SIZE)) return -EINVAL; - /* we need capabilities... */ - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; + type = get_fs_type(fstype); + if (!type) + return -ENODEV; - mnt = do_kern_mount(type, flags & ~MS_SETUSER, name, data); - if (IS_ERR(mnt)) + err = mount_is_safe(nd, type, &flags); + if (err) + goto out_put_filesystem; + + if (flags & MS_SETUSER) { + err = reserve_user_mount(); + if (err) + goto out_put_filesystem; + } + + mnt = vfs_kern_mount(type, flags & ~MS_SETUSER, name, data); + put_filesystem(type); + if (IS_ERR(mnt)) { + if (flags & MS_SETUSER) + dec_nr_user_mounts(); return PTR_ERR(mnt); + } if (flags & MS_SETUSER) - set_mnt_user(mnt); + __set_mnt_user(mnt); return do_add_mount(mnt, nd, mnt_flags, NULL); + + out_put_filesystem: + put_filesystem(type); + return err; } /* @@ -1205,7 +1229,7 @@ int do_add_mount(struct vfsmount *newmnt if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode)) goto unlock; - /* MNT_USER was set earlier */ + /* some flags may have been set earlier */ newmnt->mnt_flags |= mnt_flags; if ((err = graft_tree(newmnt, nd))) goto unlock; Index: linux/include/linux/fs.h =================================================================== --- linux.orig/include/linux/fs.h 2007-04-04 19:30:05.000000000 +0200 +++ linux/include/linux/fs.h 2007-04-04 19:30:08.000000000 +0200 @@ -95,6 +95,7 @@ extern int dir_notify_enable; #define FS_REQUIRES_DEV 1 #define FS_BINARY_MOUNTDATA 2 #define FS_HAS_SUBTYPE 4 +#define FS_SAFE 8 /* Safe to mount by unprivileged users */ #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() * during rename() internally. -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* [patch 8/8] allow unprivileged fuse mounts 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi ` (6 preceding siblings ...) 2007-04-04 18:30 ` [patch 7/8] allow unprivileged mounts Miklos Szeredi @ 2007-04-04 18:30 ` Miklos Szeredi 2007-04-09 18:57 ` [patch 0/8] unprivileged mount syscall Serge E. Hallyn [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> 9 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-04 18:30 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng [-- Attachment #1: fuse_safe.patch --] [-- Type: text/plain, Size: 2469 bytes --] From: Miklos Szeredi <mszeredi@suse.cz> Use FS_SAFE for "fuse" fs type, but not for "fuseblk". FUSE was designed from the beginning to be safe for unprivileged users. This has also been verified in practice over many years. The sysadmin still needs to set "fs.max_user_mounts" sysctl variable to a non-zero value to enable unprivileged mounts. This will enable future installations to remove the suid-root fusermount utility. Don't require the "user_id=" and "group_id=" options for unprivileged mounts, but if they are present verify them for sanity. Disallow the "allow_other" option for unprivileged mounts. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- Index: linux/fs/fuse/inode.c =================================================================== --- linux.orig/fs/fuse/inode.c 2007-04-04 19:29:44.000000000 +0200 +++ linux/fs/fuse/inode.c 2007-04-04 19:30:11.000000000 +0200 @@ -311,6 +311,19 @@ static int parse_fuse_opt(char *opt, str d->max_read = ~0; d->blksize = 512; + /* + * For unprivileged mounts use current uid/gid. Still allow + * "user_id" and "group_id" options for compatibility, but + * only if they match these values. + */ + if (!capable(CAP_SYS_ADMIN)) { + d->user_id = current->uid; + d->user_id_present = 1; + d->group_id = current->gid; + d->group_id_present = 1; + + } + while ((p = strsep(&opt, ",")) != NULL) { int token; int value; @@ -339,6 +352,8 @@ static int parse_fuse_opt(char *opt, str case OPT_USER_ID: if (match_int(&args[0], &value)) return 0; + if (d->user_id_present && d->user_id != value) + return 0; d->user_id = value; d->user_id_present = 1; break; @@ -346,6 +361,8 @@ static int parse_fuse_opt(char *opt, str case OPT_GROUP_ID: if (match_int(&args[0], &value)) return 0; + if (d->group_id_present && d->group_id != value) + return 0; d->group_id = value; d->group_id_present = 1; break; @@ -536,6 +553,10 @@ static int fuse_fill_super(struct super_ if (!parse_fuse_opt((char *) data, &d, is_bdev)) return -EINVAL; + /* This is a privileged option */ + if ((d.flags & FUSE_ALLOW_OTHER) && !capable(CAP_SYS_ADMIN)) + return -EPERM; + if (is_bdev) { #ifdef CONFIG_BLOCK if (!sb_set_blocksize(sb, d.blksize)) @@ -639,6 +660,7 @@ static struct file_system_type fuse_fs_t .fs_flags = FS_HAS_SUBTYPE, .get_sb = fuse_get_sb, .kill_sb = kill_anon_super, + .fs_flags = FS_SAFE, }; #ifdef CONFIG_BLOCK -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi ` (7 preceding siblings ...) 2007-04-04 18:30 ` [patch 8/8] allow unprivileged fuse mounts Miklos Szeredi @ 2007-04-09 18:57 ` Serge E. Hallyn 2007-04-09 20:14 ` Miklos Szeredi [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> 9 siblings, 1 reply; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-09 18:57 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, linux-fsdevel, util-linux-ng Quoting Miklos Szeredi (miklos@szeredi.hu): > This patchset adds support for keeping mount ownership information in > the kernel, and allow unprivileged mount(2) and umount(2) in certain > cases. > > This can be useful for the following reasons: > > - mount(8) can store ownership ("user=XY" option) in the kernel > instead, or in addition to storing it in /etc/mtab. For example if > private namespaces are used with mount propagations /etc/mtab > becomes unworkable, but using /proc/mounts works fine > > - fuse won't need a special suid-root mount/umount utility. Plain > umount(8) can easily be made to work with unprivileged fuse mounts > > - users can use bind mounts without having to pre-configure them in > /etc/fstab > > All this is done in a secure way, and unprivileged bind and fuse > mounts are disabled by default and can be enabled through sysctl or > /proc/sys. > > One thing that is missing from this series is the ability to restrict > user mounts to private namespaces. The reason is that private > namespaces have still not gained the momentum and support needed for > painless user experience. So such a feature would not yet get enough > attention and testing. However adding such an optional restriction > can be done with minimal changes in the future, once private > namespaces have matured. What is the main reason for that feature? Would it be to prevent things like login from being tricked by user mounts? Isn't it sufficient, in fact, better, to require that the target of the mount be owned by the user doing the mount? -serge (who's pretty sure he's missing something) > An earlier version of these patches have been discussed here: > > http://lkml.org/lkml/2005/5/3/64 > > -- > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 18:57 ` [patch 0/8] unprivileged mount syscall Serge E. Hallyn @ 2007-04-09 20:14 ` Miklos Szeredi 2007-04-09 20:55 ` Serge E. Hallyn 0 siblings, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-09 20:14 UTC (permalink / raw) To: serue; +Cc: akpm, linux-fsdevel, util-linux-ng > > One thing that is missing from this series is the ability to restrict > > user mounts to private namespaces. The reason is that private > > namespaces have still not gained the momentum and support needed for > > painless user experience. So such a feature would not yet get enough > > attention and testing. However adding such an optional restriction > > can be done with minimal changes in the future, once private > > namespaces have matured. > > What is the main reason for that feature? Would it be to prevent things > like login from being tricked by user mounts? Isn't it sufficient, in > fact, better, to require that the target of the mount be owned by the > user doing the mount? It's been discussed later in that thread. Basically you can fool a lot of system programs (like backup) with mounting/binding in the global namespace. Restricting the destination doesn't always help. Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 20:14 ` Miklos Szeredi @ 2007-04-09 20:55 ` Serge E. Hallyn [not found] ` <20070409205506.GC20226-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-09 20:55 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, linux-fsdevel, util-linux-ng Quoting Miklos Szeredi (miklos@szeredi.hu): > > > One thing that is missing from this series is the ability to restrict > > > user mounts to private namespaces. The reason is that private > > > namespaces have still not gained the momentum and support needed for > > > painless user experience. So such a feature would not yet get enough > > > attention and testing. However adding such an optional restriction > > > can be done with minimal changes in the future, once private > > > namespaces have matured. > > > > What is the main reason for that feature? Would it be to prevent things > > like login from being tricked by user mounts? Isn't it sufficient, in > > fact, better, to require that the target of the mount be owned by the > > user doing the mount? > > It's been discussed later in that thread. Basically you can fool a I see now, sorry. > lot of system programs (like backup) with mounting/binding in the > global namespace. Restricting the destination doesn't always help. > > Miklos It would be nice in general if we could avoid any sort of checks for (mnt->mnt_ns == init_nsproxy.mnt_ns). Maybe that won't be possible, but, taking the two listed examples: 1. mount --bind / ~/bindns; (later) userdel hallyn I assume userdel does a simple stupid rm -rf without first umounting, then? So (1) it seems wise to have userdel umount anything under ~user first anyway, and (2) if $USER does a mount --bind from a source he doesn't own, should we make the resulting mount read-only? (realizing the read-only bind mount patches are still under development :) Or is that overly restrictive somehow for fuse? 2. backups Is this just a 'he's going to fill up the whole disk' issue? Frankly, it seems wise to have cron or whatever is spawning the backup start in it's own namespace right at boot. Generally when I think back on sites where I've dealt with backup, backups were done on a separate server which didn't allow userlogins anyway, so it wouldn't be a problem. But I'm sure that's a limited (==erroneous) POV. I do realize that the whole problem about corner cases isn't addressing two little ones, but the fact that there are more we haven't thought of. So are there any currently known use cases where requiring a CLONE_NEWNS before user mounts is unacceptable? thanks, -serge ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <20070409205506.GC20226-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <20070409205506.GC20226-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org> @ 2007-04-11 19:43 ` Miklos Szeredi [not found] ` <E1Hbiih-00060L-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-11 19:43 UTC (permalink / raw) To: serue-r/Jw6+rmf7HQT0dZR+AlfA Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA > It would be nice in general if we could avoid any sort of checks for > (mnt->mnt_ns == init_nsproxy.mnt_ns). Maybe that won't be possible, > but, taking the two listed examples: [snip] It's probably worthwile going after these problematic cases, and fixing them, OTOH it's not easy to audit a complete system for holes arising from user mounts in the global namespace. So why not move this decision out from the kernel? How about adding a boolean flag to namespaces, which specifies whether unprivileged mounts are allowed or not. This would give complete flexibility to distro builders and sysadmins. The biggest problem I see is how to set this flag. There's no easy way to represent namespaces in /proc or /sys, and this is sufficiently obscure not to warrant a new syscall. Adding a new flag to prctl() could do the trick. Does that sound OK? Thanks, Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <E1Hbiih-00060L-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <E1Hbiih-00060L-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> @ 2007-04-11 20:05 ` Serge E. Hallyn 2007-04-11 20:41 ` Miklos Szeredi 0 siblings, 1 reply; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-11 20:05 UTC (permalink / raw) To: Miklos Szeredi Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org): > > It would be nice in general if we could avoid any sort of checks for > > (mnt->mnt_ns == init_nsproxy.mnt_ns). Maybe that won't be possible, > > but, taking the two listed examples: > > [snip] > > It's probably worthwile going after these problematic cases, and > fixing them, OTOH it's not easy to audit a complete system for holes > arising from user mounts in the global namespace. > > So why not move this decision out from the kernel? How about adding a > boolean flag to namespaces, which specifies whether unprivileged > mounts are allowed or not. This would give complete flexibility to > distro builders and sysadmins. > > The biggest problem I see is how to set this flag. There's no easy > way to represent namespaces in /proc or /sys, and this is sufficiently > obscure not to warrant a new syscall. Adding a new flag to prctl() > could do the trick. Does that sound OK? Not objecting to prctl(), but two other options would be 1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is the time at which the ns is created, so in that sense it makes sense. 2. use the nsproxy container subsystem (see Paul Menage's containers patchset) to set this using, e.g., echo 1 > /containers/vserver1/mounts/usermount The prctl() method has a huge advantage of being implementable right now. -serge ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-11 20:05 ` Serge E. Hallyn @ 2007-04-11 20:41 ` Miklos Szeredi 2007-04-11 20:57 ` Serge E. Hallyn 0 siblings, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-11 20:41 UTC (permalink / raw) To: serue; +Cc: akpm, linux-fsdevel, util-linux-ng > Not objecting to prctl(), but two other options would be > > 1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is > the time at which the ns is created, so in that sense it > makes sense. Yes, I thought about this, but there's no easy way to set the flag for the initial namespace, and a second flag CLONE_NEW_NS_NOUSERMNT would be needed to turn off the flag. > 2. use the nsproxy container subsystem (see Paul Menage's > containers patchset) to set this using, e.g., > > echo 1 > /containers/vserver1/mounts/usermount That again would lose some flexibility: only namespaces which are part of a container could be manipulated. Does that exclude the initial namespace? Also how would a process find out which vserver it is running in? Thanks, Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-11 20:41 ` Miklos Szeredi @ 2007-04-11 20:57 ` Serge E. Hallyn 0 siblings, 0 replies; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-11 20:57 UTC (permalink / raw) To: Miklos Szeredi; +Cc: akpm, linux-fsdevel, util-linux-ng Quoting Miklos Szeredi (miklos@szeredi.hu): > > Not objecting to prctl(), but two other options would be > > > > 1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is > > the time at which the ns is created, so in that sense it > > makes sense. > > Yes, I thought about this, but there's no easy way to set the flag for > the initial namespace, and a second flag CLONE_NEW_NS_NOUSERMNT would > be needed to turn off the flag. Not mentioning it would 'turn it off' for the cloned ns, but the default value for the initial namespace is still a problem. > > 2. use the nsproxy container subsystem (see Paul Menage's > > containers patchset) to set this using, e.g., > > > > echo 1 > /containers/vserver1/mounts/usermount > > That again would lose some flexibility: only namespaces which > are part of a container could be manipulated. In the nsproxy subsystem, every namespace gets a container so long as the nsproxy subsystem is mounted. > Does that exclude the > initial namespace? No, the initial namespace is tied to the root dentry - so if as my example was assuming youve done mount -t container -o ns none /containers then to change the setting for the initial namespace you would echo 0 > /containers/mounts/usermount > Also how would a process find out which vserver it is running in? cat /proc/$$/container -serge ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> @ 2007-04-06 23:02 ` Andrew Morton 2007-04-06 23:16 ` H. Peter Anvin 2007-04-07 6:41 ` Miklos Szeredi 2007-04-09 22:00 ` Serge E. Hallyn 1 sibling, 2 replies; 54+ messages in thread From: Andrew Morton @ 2007-04-06 23:02 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, 04 Apr 2007 20:30:12 +0200 Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> wrote: > This patchset adds support for keeping mount ownership information in > the kernel, and allow unprivileged mount(2) and umount(2) in certain > cases. No replies, huh? My knowledge of the code which you're touching is not strong, and my spare reviewing capacity is not high. And this work does need close review by people who are familar with the code which you're changing. So could I suggest that you go for a dig through the git history, identify some individuals who look like they know this code, then do a resend, cc'ing those people? Please also cc linux-kernel on that resend. > This can be useful for the following reasons: > > - mount(8) can store ownership ("user=XY" option) in the kernel > instead, or in addition to storing it in /etc/mtab. For example if > private namespaces are used with mount propagations /etc/mtab > becomes unworkable, but using /proc/mounts works fine > > - fuse won't need a special suid-root mount/umount utility. Plain > umount(8) can easily be made to work with unprivileged fuse mounts > > - users can use bind mounts without having to pre-configure them in > /etc/fstab > > All this is done in a secure way, and unprivileged bind and fuse > mounts are disabled by default and can be enabled through sysctl or > /proc/sys. > > One thing that is missing from this series is the ability to restrict > user mounts to private namespaces. The reason is that private > namespaces have still not gained the momentum and support needed for > painless user experience. So such a feature would not yet get enough > attention and testing. However adding such an optional restriction > can be done with minimal changes in the future, once private > namespaces have matured. I suspect the people who developed and maintain nsproxy would disagree ;) Please also cc containers-qjLDD68F18NYIhldQZh9Cg@public.gmane.org > An earlier version of these patches have been discussed here: > > http://lkml.org/lkml/2005/5/3/64 > > -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-06 23:02 ` Andrew Morton @ 2007-04-06 23:16 ` H. Peter Anvin 2007-04-06 23:55 ` Jan Engelhardt 2007-04-10 8:52 ` Ian Kent 2007-04-07 6:41 ` Miklos Szeredi 1 sibling, 2 replies; 54+ messages in thread From: H. Peter Anvin @ 2007-04-06 23:16 UTC (permalink / raw) To: Andrew Morton Cc: Miklos Szeredi, linux-fsdevel, util-linux-ng, containers, linux-kernel >> >> - users can use bind mounts without having to pre-configure them in >> /etc/fstab >> This is by far the biggest concern I see. I think the security implication of allowing anyone to do bind mounts are poorly understood. -hpa ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-06 23:16 ` H. Peter Anvin @ 2007-04-06 23:55 ` Jan Engelhardt 2007-04-07 0:22 ` H. Peter Anvin 2007-04-10 8:52 ` Ian Kent 1 sibling, 1 reply; 54+ messages in thread From: Jan Engelhardt @ 2007-04-06 23:55 UTC (permalink / raw) To: H. Peter Anvin Cc: Andrew Morton, Miklos Szeredi, linux-fsdevel, util-linux-ng, containers, linux-kernel On Apr 6 2007 16:16, H. Peter Anvin wrote: >> > >> > - users can use bind mounts without having to pre-configure them in >> > /etc/fstab >> > > > This is by far the biggest concern I see. I think the security implication of > allowing anyone to do bind mounts are poorly understood. $ whoami miklos $ mount --bind / ~/down_under later that day: # userdel -r miklos So both the source (/) and target (~/down_under) directory must be owned by the user before --bind may succeed. There may be other implications hpa might want to fill us in. Regards, Jan -- ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-06 23:55 ` Jan Engelhardt @ 2007-04-07 0:22 ` H. Peter Anvin 2007-04-07 3:40 ` Eric Van Hensbergen 0 siblings, 1 reply; 54+ messages in thread From: H. Peter Anvin @ 2007-04-07 0:22 UTC (permalink / raw) To: Jan Engelhardt Cc: Andrew Morton, Miklos Szeredi, linux-fsdevel, util-linux-ng, containers, linux-kernel Jan Engelhardt wrote: > On Apr 6 2007 16:16, H. Peter Anvin wrote: >>>> - users can use bind mounts without having to pre-configure them in >>>> /etc/fstab >>>> >> This is by far the biggest concern I see. I think the security implication of >> allowing anyone to do bind mounts are poorly understood. > > $ whoami > miklos > $ mount --bind / ~/down_under > > later that day: > # userdel -r miklos > > So both the source (/) and target (~/down_under) directory must be owned > by the user before --bind may succeed. > > There may be other implications hpa might want to fill us in. Consider backups, for example. -hpa ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-07 0:22 ` H. Peter Anvin @ 2007-04-07 3:40 ` Eric Van Hensbergen [not found] ` <a4e6962a0704062040q12c0013ek9591b9fbb27caa12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Eric Van Hensbergen @ 2007-04-07 3:40 UTC (permalink / raw) To: H. Peter Anvin Cc: Jan Engelhardt, Andrew Morton, Miklos Szeredi, linux-fsdevel, util-linux-ng, containers, linux-kernel On 4/6/07, H. Peter Anvin <hpa@zytor.com> wrote: > Jan Engelhardt wrote: > > On Apr 6 2007 16:16, H. Peter Anvin wrote: > >>>> - users can use bind mounts without having to pre-configure them in > >>>> /etc/fstab > >>>> > >> This is by far the biggest concern I see. I think the security implication of > >> allowing anyone to do bind mounts are poorly understood. > > > > $ whoami > > miklos > > $ mount --bind / ~/down_under > > > > later that day: > > # userdel -r miklos > > > > Consider backups, for example. > This is the reason why enforcing private namespaces for user mounts makes sense. I think it catches many of these corner cases. -eric ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <a4e6962a0704062040q12c0013ek9591b9fbb27caa12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <a4e6962a0704062040q12c0013ek9591b9fbb27caa12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2007-04-07 6:48 ` Miklos Szeredi 0 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-07 6:48 UTC (permalink / raw) To: ericvh-Re5JQEeQqe8AvxtiuMwx3w Cc: hpa-YMNOUZJC4hwAvxtiuMwx3w, jengelh-CujU1KeUx2fb/Wh9oZwLjA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, linux-kernel-u79uwXL29TY76Z2rM5mHXA > On 4/6/07, H. Peter Anvin <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> wrote: > > Jan Engelhardt wrote: > > > On Apr 6 2007 16:16, H. Peter Anvin wrote: > > >>>> - users can use bind mounts without having to pre-configure them in > > >>>> /etc/fstab > > >>>> > > >> This is by far the biggest concern I see. I think the security implication of > > >> allowing anyone to do bind mounts are poorly understood. > > > > > > $ whoami > > > miklos > > > $ mount --bind / ~/down_under > > > > > > later that day: > > > # userdel -r miklos > > > > > > > Consider backups, for example. > > > > This is the reason why enforcing private namespaces for user mounts > makes sense. I think it catches many of these corner cases. Yes, disabling user bind mounts in the global namespace makes sense. Enabling user fuse mounts in the global namespace still works though, even if a little cludgy. All these nasty corner cases have been thought through and validated by a lot of users. Thanks, Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-06 23:16 ` H. Peter Anvin 2007-04-06 23:55 ` Jan Engelhardt @ 2007-04-10 8:52 ` Ian Kent [not found] ` <1176195125.3476.47.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org> 1 sibling, 1 reply; 54+ messages in thread From: Ian Kent @ 2007-04-10 8:52 UTC (permalink / raw) To: H. Peter Anvin Cc: Andrew Morton, Miklos Szeredi, linux-fsdevel, util-linux-ng, containers, linux-kernel On Fri, 2007-04-06 at 16:16 -0700, H. Peter Anvin wrote: > >> > >> - users can use bind mounts without having to pre-configure them in > >> /etc/fstab > >> > > This is by far the biggest concern I see. I think the security > implication of allowing anyone to do bind mounts are poorly understood. And especially so since there is no way for a filesystem module to veto such requests. Ian ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <1176195125.3476.47.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <1176195125.3476.47.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org> @ 2007-04-11 10:48 ` Miklos Szeredi 2007-04-11 13:48 ` Ian Kent 0 siblings, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-11 10:48 UTC (permalink / raw) To: raven-PKsaG3nR2I+sTnJN9+BGXg Cc: hpa-YMNOUZJC4hwAvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, linux-kernel-u79uwXL29TY76Z2rM5mHXA > > >> > > >> - users can use bind mounts without having to pre-configure them in > > >> /etc/fstab > > >> > > > > This is by far the biggest concern I see. I think the security > > implication of allowing anyone to do bind mounts are poorly understood. > > And especially so since there is no way for a filesystem module to veto > such requests. The filesystem can't veto initial mounts based on destination either. I don't think it's up to the filesystem to police bind/move mounts in any way. Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-11 10:48 ` Miklos Szeredi @ 2007-04-11 13:48 ` Ian Kent [not found] ` <1176299311.3377.6.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Ian Kent @ 2007-04-11 13:48 UTC (permalink / raw) To: Miklos Szeredi Cc: hpa, akpm, linux-fsdevel, util-linux-ng, containers, linux-kernel On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote: > > > >> > > > >> - users can use bind mounts without having to pre-configure them in > > > >> /etc/fstab > > > >> > > > > > > This is by far the biggest concern I see. I think the security > > > implication of allowing anyone to do bind mounts are poorly understood. > > > > And especially so since there is no way for a filesystem module to veto > > such requests. > > The filesystem can't veto initial mounts based on destination either. > I don't think it's up to the filesystem to police bind/move mounts in > any way. But if a filesystem can't or the developer thinks that it shouldn't for some reason, support bind/move mounts then there should be a way for the filesystem to tell the kernel that. Surely a filesystem is in a good position to be able to decide if a mount request "for it" should be allowed to continue based on it's "own situation and capabilities". Ian ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <1176299311.3377.6.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <1176299311.3377.6.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org> @ 2007-04-11 14:26 ` Serge E. Hallyn [not found] ` <20070411142608.GC30460-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-11 14:26 UTC (permalink / raw) To: Ian Kent Cc: Miklos Szeredi, hpa-YMNOUZJC4hwAvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, linux-kernel-u79uwXL29TY76Z2rM5mHXA Quoting Ian Kent (raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org): > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote: > > > > >> > > > > >> - users can use bind mounts without having to pre-configure them in > > > > >> /etc/fstab > > > > >> > > > > > > > > This is by far the biggest concern I see. I think the security > > > > implication of allowing anyone to do bind mounts are poorly understood. > > > > > > And especially so since there is no way for a filesystem module to veto > > > such requests. > > > > The filesystem can't veto initial mounts based on destination either. > > I don't think it's up to the filesystem to police bind/move mounts in > > any way. > > But if a filesystem can't or the developer thinks that it shouldn't for > some reason, support bind/move mounts then there should be a way for the Can you list some valid reasons why an fs could care where it is mounted? The only thing I could think of is a stackable fs, but it shouldn't care whether it is overlay-mounted or not. thanks, -serge > filesystem to tell the kernel that. > > Surely a filesystem is in a good position to be able to decide if a > mount request "for it" should be allowed to continue based on it's "own > situation and capabilities". > > Ian > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <20070411142608.GC30460-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <20070411142608.GC30460-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org> @ 2007-04-11 14:27 ` Ian Kent [not found] ` <1176301632.3377.9.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Ian Kent @ 2007-04-11 14:27 UTC (permalink / raw) To: Serge E. Hallyn Cc: Miklos Szeredi, hpa-YMNOUZJC4hwAvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, 2007-04-11 at 09:26 -0500, Serge E. Hallyn wrote: > Quoting Ian Kent (raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org): > > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote: > > > > > >> > > > > > >> - users can use bind mounts without having to pre-configure them in > > > > > >> /etc/fstab > > > > > >> > > > > > > > > > > This is by far the biggest concern I see. I think the security > > > > > implication of allowing anyone to do bind mounts are poorly understood. > > > > > > > > And especially so since there is no way for a filesystem module to veto > > > > such requests. > > > > > > The filesystem can't veto initial mounts based on destination either. > > > I don't think it's up to the filesystem to police bind/move mounts in > > > any way. > > > > But if a filesystem can't or the developer thinks that it shouldn't for > > some reason, support bind/move mounts then there should be a way for the > > Can you list some valid reasons why an fs could care where it is > mounted? The only thing I could think of is a stackable fs, but it > shouldn't care whether it is overlay-mounted or not. For my part, autofs and autofs4. Moving or binding isn't valid. I tried to design that limitation out version 5 but wasn't able to. In time I probably can but couldn't continue to support older versions. > > thanks, > -serge > > > filesystem to tell the kernel that. > > > > Surely a filesystem is in a good position to be able to decide if a > > mount request "for it" should be allowed to continue based on it's "own > > situation and capabilities". > > > > Ian > > > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <1176301632.3377.9.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <1176301632.3377.9.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org> @ 2007-04-11 14:45 ` Serge E. Hallyn 0 siblings, 0 replies; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-11 14:45 UTC (permalink / raw) To: Ian Kent Cc: Miklos Szeredi, hpa-YMNOUZJC4hwAvxtiuMwx3w, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, linux-kernel-u79uwXL29TY76Z2rM5mHXA Quoting Ian Kent (raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org): > On Wed, 2007-04-11 at 09:26 -0500, Serge E. Hallyn wrote: > > Quoting Ian Kent (raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org): > > > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote: > > > > > > >> > > > > > > >> - users can use bind mounts without having to pre-configure them in > > > > > > >> /etc/fstab > > > > > > >> > > > > > > > > > > > > This is by far the biggest concern I see. I think the security > > > > > > implication of allowing anyone to do bind mounts are poorly understood. > > > > > > > > > > And especially so since there is no way for a filesystem module to veto > > > > > such requests. > > > > > > > > The filesystem can't veto initial mounts based on destination either. > > > > I don't think it's up to the filesystem to police bind/move mounts in > > > > any way. > > > > > > But if a filesystem can't or the developer thinks that it shouldn't for > > > some reason, support bind/move mounts then there should be a way for the > > > > Can you list some valid reasons why an fs could care where it is > > mounted? The only thing I could think of is a stackable fs, but it > > shouldn't care whether it is overlay-mounted or not. > > For my part, autofs and autofs4. Ah, thanks. I can see I'm going to have start using autofs to get to know the implementation, because it seems clear we'll run into it in the containers work again (beyond the struct pid conv) at some point. > Moving or binding isn't valid. > I tried to design that limitation out version 5 but wasn't able to. > In time I probably can but couldn't continue to support older versions. thanks, -serge > > > > thanks, > > -serge > > > > > filesystem to tell the kernel that. > > > > > > Surely a filesystem is in a good position to be able to decide if a > > > mount request "for it" should be allowed to continue based on it's "own > > > situation and capabilities". > > > > > > Ian > > > > > > > > > > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-06 23:02 ` Andrew Morton 2007-04-06 23:16 ` H. Peter Anvin @ 2007-04-07 6:41 ` Miklos Szeredi [not found] ` <E1Ha4cN-0004rc-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> 1 sibling, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-07 6:41 UTC (permalink / raw) To: akpm; +Cc: linux-fsdevel, util-linux-ng, containers, linux-kernel > > This patchset adds support for keeping mount ownership information in > > the kernel, and allow unprivileged mount(2) and umount(2) in certain > > cases. > > No replies, huh? All we need is a comment from Andrew, and the replies come flooding in ;) > My knowledge of the code which you're touching is not strong, and my spare > reviewing capacity is not high. And this work does need close review by > people who are familar with the code which you're changing. > > So could I suggest that you go for a dig through the git history, identify > some individuals who look like they know this code, then do a resend, > cc'ing those people? Please also cc linux-kernel on that resend. OK. > > One thing that is missing from this series is the ability to restrict > > user mounts to private namespaces. The reason is that private > > namespaces have still not gained the momentum and support needed for > > painless user experience. So such a feature would not yet get enough > > attention and testing. However adding such an optional restriction > > can be done with minimal changes in the future, once private > > namespaces have matured. > > I suspect the people who developed and maintain nsproxy would disagree ;) Well, they better show me some working and simple-to-use userspace code, because I've not seen anything like that related to mount namespaces. pam_namespace.so is one example of a non-working, but probably-not-too- hard-to-fix one. I'm just saying this is not yet something that Joe Blow would just enable by ticking a box in their desktop setup wizard, and it would all work flawlessly thereafter. There's still a _long_ way towards that, and mostly in userspace. Thanks, Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <E1Ha4cN-0004rc-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <E1Ha4cN-0004rc-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> @ 2007-04-09 14:38 ` Serge E. Hallyn [not found] ` <20070409143802.GB4891-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-09 14:38 UTC (permalink / raw) To: Miklos Szeredi Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org): > > > This patchset adds support for keeping mount ownership information in > > > the kernel, and allow unprivileged mount(2) and umount(2) in certain > > > cases. > > > > No replies, huh? > > All we need is a comment from Andrew, and the replies come flooding in ;) > > > My knowledge of the code which you're touching is not strong, and my spare > > reviewing capacity is not high. And this work does need close review by > > people who are familar with the code which you're changing. > > > > So could I suggest that you go for a dig through the git history, identify > > some individuals who look like they know this code, then do a resend, > > cc'ing those people? Please also cc linux-kernel on that resend. > > OK. > > > > One thing that is missing from this series is the ability to restrict > > > user mounts to private namespaces. The reason is that private > > > namespaces have still not gained the momentum and support needed for > > > painless user experience. So such a feature would not yet get enough > > > attention and testing. However adding such an optional restriction > > > can be done with minimal changes in the future, once private > > > namespaces have matured. > > > > I suspect the people who developed and maintain nsproxy would disagree ;) > > Well, they better show me some working and simple-to-use userspace > code, because I've not seen anything like that related to mount > namespaces. If you mean to test/exploit them, see http://lxc.sourceforge.net/patches/2.6.20/2.6.20-lxc8/broken-out/tests/ Compile the ns_exec.c program and do ns_exec -m /bin/sh to get a shell in a new mounts namespace. > pam_namespace.so is one example of a non-working, but probably-not-too- > hard-to-fix one. Non-working? I sure hope the one used for LSPP certification is working... As is the ugly version I wrote 18 mounts ago and use on my laptop. > I'm just saying this is not yet something that Joe Blow would just > enable by ticking a box in their desktop setup wizard, and it would > all work flawlessly thereafter. There's still a _long_ way towards > that, and mostly in userspace. I'm not sure there's a that long a way to go, but clearly we need to be showing users what they can do, or they'll never work their way towards there. For instance, as you say, a user admin gui with a checkmark and text boxes saying 'enter new namespace on login', 'create private /tmp', and 'create private dmcrypted /home' would be trivial right now. -serge ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <20070409143802.GB4891-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <20070409143802.GB4891-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org> @ 2007-04-09 16:24 ` Miklos Szeredi 2007-04-09 17:07 ` Serge E. Hallyn 0 siblings, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-09 16:24 UTC (permalink / raw) To: serue-r/Jw6+rmf7HQT0dZR+AlfA Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA > > > > One thing that is missing from this series is the ability to restrict > > > > user mounts to private namespaces. The reason is that private > > > > namespaces have still not gained the momentum and support needed for > > > > painless user experience. So such a feature would not yet get enough > > > > attention and testing. However adding such an optional restriction > > > > can be done with minimal changes in the future, once private > > > > namespaces have matured. > > > > > > I suspect the people who developed and maintain nsproxy would disagree ;) > > > > Well, they better show me some working and simple-to-use userspace > > code, because I've not seen anything like that related to mount > > namespaces. > > If you mean to test/exploit them, see > http://lxc.sourceforge.net/patches/2.6.20/2.6.20-lxc8/broken-out/tests/ > > Compile the ns_exec.c program and do > > ns_exec -m /bin/sh > > to get a shell in a new mounts namespace. Cool, thanks. This is a very nice utility for testing, but for the end user rather useless: - user starts up a private namespace in a shell, mounts something - then opens app from menu, tries to access mount, but the mount is not there - user unhappy BTW, looking at -mm unshare() on namespace is not privileged any more. Why is that? Or rather, what's the reason, that clone() is privileged and unshare() is not? > > pam_namespace.so is one example of a non-working, but probably-not-too- > > hard-to-fix one. > > Non-working? I sure hope the one used for LSPP certification is > working... As is the ugly version I wrote 18 mounts ago and use on my > laptop. The one in pam-0.99.6.3-29.1 in opensuse-10.2 is totally broken. Are you interested in the details? I can reproduce it, but forgot to note down the details of the brokenness. > > I'm just saying this is not yet something that Joe Blow would just > > enable by ticking a box in their desktop setup wizard, and it would > > all work flawlessly thereafter. There's still a _long_ way towards > > that, and mostly in userspace. > > I'm not sure there's a that long a way to go, but clearly we need to be > showing users what they can do, or they'll never work their way towards > there. There _is_ a long way to go. Random things that spring to my mind: - using /etc/mtab is broken with private namespaces, using /proc/mounts is missing various functionality, that /etc/mtab has, for example the "user" option, which this patchset adds - need to set up mount propagation from global namespace to private ones, mount(8) does not yet have options to configure propagation - user namespace setup: what if user has multiple sessions? 1) namespaces are shared? That's tricky because the session needs to be a child of a namespace server, not of login. I'm not sure PAM can handle this 2) or mounts are copied on login? That's not possible currently, as there's no way to send a mount between namespaces. Also it's tricky to make sure that new mounts are also shared > For instance, as you say, a user admin gui with a checkmark and text > boxes saying 'enter new namespace on login', 'create private /tmp', > and 'create private dmcrypted /home' would be trivial right now. Trivial modulo the above slightly non-trivial exemptions ;) Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 16:24 ` Miklos Szeredi @ 2007-04-09 17:07 ` Serge E. Hallyn 2007-04-09 17:46 ` Ram Pai 2007-04-09 20:10 ` Miklos Szeredi 0 siblings, 2 replies; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-09 17:07 UTC (permalink / raw) To: Miklos Szeredi Cc: serue, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel, Ram Pai Quoting Miklos Szeredi (miklos@szeredi.hu): > > > > > One thing that is missing from this series is the ability to restrict > > > > > user mounts to private namespaces. The reason is that private > > > > > namespaces have still not gained the momentum and support needed for > > > > > painless user experience. So such a feature would not yet get enough > > > > > attention and testing. However adding such an optional restriction > > > > > can be done with minimal changes in the future, once private > > > > > namespaces have matured. > > > > > > > > I suspect the people who developed and maintain nsproxy would disagree ;) > > > > > > Well, they better show me some working and simple-to-use userspace > > > code, because I've not seen anything like that related to mount > > > namespaces. > > > > If you mean to test/exploit them, see > > http://lxc.sourceforge.net/patches/2.6.20/2.6.20-lxc8/broken-out/tests/ > > > > Compile the ns_exec.c program and do > > > > ns_exec -m /bin/sh > > > > to get a shell in a new mounts namespace. > > Cool, thanks. This is a very nice utility for testing, but for the > end user rather useless: Well that depends on which end-user. Those wanting to create a vserver or checkpoint-restart job will want this, but clearly we have a long way to go for that upstream anyway. > - user starts up a private namespace in a shell, mounts something > > - then opens app from menu, tries to access mount, but the mount is > not there > > - user unhappy > > BTW, looking at -mm unshare() on namespace is not privileged any more. > Why is that? Or rather, what's the reason, that clone() is privileged > and unshare() is not? The check is still there - see kernel/nsproxy.c:unshare_nsproxy_namespaces(). > > > pam_namespace.so is one example of a non-working, but probably-not-too- > > > hard-to-fix one. > > > > Non-working? I sure hope the one used for LSPP certification is > > working... As is the ugly version I wrote 18 mounts ago and use on my > > laptop. > > The one in pam-0.99.6.3-29.1 in opensuse-10.2 is totally broken. Are > you interested in the details? I can reproduce it, but forgot to note > down the details of the brokenness. I don't know how far removed that is from the one being used by redhat, but assuming it's the same, then redhat-lspp@redhat.com will be very interested. > > > I'm just saying this is not yet something that Joe Blow would just > > > enable by ticking a box in their desktop setup wizard, and it would > > > all work flawlessly thereafter. There's still a _long_ way towards > > > that, and mostly in userspace. > > > > I'm not sure there's a that long a way to go, but clearly we need to be > > showing users what they can do, or they'll never work their way towards > > there. > > There _is_ a long way to go. Random things that spring to my mind: > > - using /etc/mtab is broken with private namespaces, using > /proc/mounts is missing various functionality, that /etc/mtab has, > for example the "user" option, which this patchset adds Agreed those need fixing. > - need to set up mount propagation from global namespace to private > ones, mount(8) does not yet have options to configure propagation Hmm, I guess I get lost using my own little systems, and just assumed that shared subtree functionality was making its way up into mount(8). Ram, have you been working on that? > - user namespace setup: what if user has multiple sessions? > > 1) namespaces are shared? That's tricky because the session needs to > be a child of a namespace server, not of login. I'm not sure PAM > can handle this > > 2) or mounts are copied on login? That's not possible currently, > as there's no way to send a mount between namespaces. Also it's > tricky to make sure that new mounts are also shared See toward the end of the 'shared subtrees' OLS paper from last year for a suggestion on how to let users effectively 'log in to' an existing private mounts ns. > > For instance, as you say, a user admin gui with a checkmark and text > > boxes saying 'enter new namespace on login', 'create private /tmp', > > and 'create private dmcrypted /home' would be trivial right now. > > Trivial modulo the above slightly non-trivial exemptions ;) Ok, so it can use some very non-trivial fine-tuning... But I've been using the above - minus the trivial gui - for over a year without ever worrying about any of these short-comings. > Miklos -serge ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 17:07 ` Serge E. Hallyn @ 2007-04-09 17:46 ` Ram Pai 2007-04-09 18:25 ` H. Peter Anvin 2007-04-10 10:33 ` Karel Zak 2007-04-09 20:10 ` Miklos Szeredi 1 sibling, 2 replies; 54+ messages in thread From: Ram Pai @ 2007-04-09 17:46 UTC (permalink / raw) To: Serge E. Hallyn Cc: Miklos Szeredi, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel On Mon, 2007-04-09 at 12:07 -0500, Serge E. Hallyn wrote: > Quoting Miklos Szeredi (miklos@szeredi.hu): > > - need to set up mount propagation from global namespace to private > > ones, mount(8) does not yet have options to configure propagation > > Hmm, I guess I get lost using my own little systems, and just assumed > that shared subtree functionality was making its way up into mount(8). > Ram, have you been working on that? It is in FC6. I dont know the status off upstream util-linux. I did submit the patch many times to Adrian Bunk (the then util-linux maintainer) and got no response. I have not pushed the patches to the new maintainer(Karel Zak?) though. RP ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 17:46 ` Ram Pai @ 2007-04-09 18:25 ` H. Peter Anvin 2007-04-10 10:33 ` Karel Zak 1 sibling, 0 replies; 54+ messages in thread From: H. Peter Anvin @ 2007-04-09 18:25 UTC (permalink / raw) To: Ram Pai Cc: Serge E. Hallyn, Miklos Szeredi, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel Ram Pai wrote: > > It is in FC6. I dont know the status off upstream util-linux. I did > submit the patch many times to Adrian Bunk (the then util-linux > maintainer) and got no response. I have not pushed the patches to the > new maintainer(Karel Zak?) though. > Well, do that, then :) Seriously. The whole point of util-linux-ng is to make forward progress. -hpa ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 17:46 ` Ram Pai 2007-04-09 18:25 ` H. Peter Anvin @ 2007-04-10 10:33 ` Karel Zak 1 sibling, 0 replies; 54+ messages in thread From: Karel Zak @ 2007-04-10 10:33 UTC (permalink / raw) To: Ram Pai Cc: Serge E. Hallyn, Miklos Szeredi, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel On Mon, Apr 09, 2007 at 10:46:25AM -0700, Ram Pai wrote: > On Mon, 2007-04-09 at 12:07 -0500, Serge E. Hallyn wrote: > > Quoting Miklos Szeredi (miklos@szeredi.hu): > > > > - need to set up mount propagation from global namespace to private > > > ones, mount(8) does not yet have options to configure propagation > > > > Hmm, I guess I get lost using my own little systems, and just assumed > > that shared subtree functionality was making its way up into mount(8). > > Ram, have you been working on that? > > It is in FC6. I dont know the status off upstream util-linux. I did > submit the patch many times to Adrian Bunk (the then util-linux > maintainer) and got no response. I have not pushed the patches to the > new maintainer(Karel Zak?) though. The "shared-subtree" patch has been applied: http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=commitdiff;h=389fbea536e4308d9475fa2a89e53e188ce8a0e3;hp=939a997de0c761d29fb7530976ca20da4898703a Karel -- Karel Zak <kzak@redhat.com> ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 17:07 ` Serge E. Hallyn 2007-04-09 17:46 ` Ram Pai @ 2007-04-09 20:10 ` Miklos Szeredi 2007-04-10 8:38 ` Ram Pai 1 sibling, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-09 20:10 UTC (permalink / raw) To: serue Cc: akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel, linuxram > > The one in pam-0.99.6.3-29.1 in opensuse-10.2 is totally broken. Are > > you interested in the details? I can reproduce it, but forgot to note > > down the details of the brokenness. > > I don't know how far removed that is from the one being used by redhat, > but assuming it's the same, then redhat-lspp@redhat.com will be > very interested. OK. > > - user namespace setup: what if user has multiple sessions? > > > > 1) namespaces are shared? That's tricky because the session needs to > > be a child of a namespace server, not of login. I'm not sure PAM > > can handle this > > > > 2) or mounts are copied on login? That's not possible currently, > > as there's no way to send a mount between namespaces. Also it's > > tricky to make sure that new mounts are also shared > > See toward the end of the 'shared subtrees' OLS paper from last year for > a suggestion on how to let users effectively 'log in to' an existing > private mounts ns. This? 1. create a new namespace 2. bind /share/$USER to /share 3. for each pair ($who, $what) such that /share/$USER/$who/$what exists, look in /share/$who/allowed for "peer $what $USER" or "slave $what $USER". If the former is found, rbind /share/$who/$what on /share/$USER/$who/$what; if the latter is found, do the same and follow with marking subtree under /share/$USER/$who/$what as slave. 4. rbind /share/$USER to /share 5. mark subtree under /share as private. 6. umount -l /share Well, someone please explain using short words, because I don't understand at all. Thanks, Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 20:10 ` Miklos Szeredi @ 2007-04-10 8:38 ` Ram Pai 2007-04-11 10:44 ` Miklos Szeredi 0 siblings, 1 reply; 54+ messages in thread From: Ram Pai @ 2007-04-10 8:38 UTC (permalink / raw) To: Miklos Szeredi Cc: serue, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel On Mon, 2007-04-09 at 22:10 +0200, Miklos Szeredi wrote: > > > The one in pam-0.99.6.3-29.1 in opensuse-10.2 is totally broken. Are > > > you interested in the details? I can reproduce it, but forgot to note > > > down the details of the brokenness. > > > > I don't know how far removed that is from the one being used by redhat, > > but assuming it's the same, then redhat-lspp@redhat.com will be > > very interested. > > OK. > > > > - user namespace setup: what if user has multiple sessions? > > > > > > 1) namespaces are shared? That's tricky because the session needs to > > > be a child of a namespace server, not of login. I'm not sure PAM > > > can handle this > > > > > > 2) or mounts are copied on login? That's not possible currently, > > > as there's no way to send a mount between namespaces. Also it's > > > tricky to make sure that new mounts are also shared > > > > See toward the end of the 'shared subtrees' OLS paper from last year for > > a suggestion on how to let users effectively 'log in to' an existing > > private mounts ns. > > This? > > 1. create a new namespace > 2. bind /share/$USER to /share > 3. for each pair ($who, $what) such that > /share/$USER/$who/$what exists, look > in /share/$who/allowed for "peer $what > $USER" or "slave $what $USER". If the > former is found, rbind /share/$who/$what > on /share/$USER/$who/$what; if the > latter is found, do the same and > follow with marking subtree under > /share/$USER/$who/$what as slave. > 4. rbind /share/$USER to /share > 5. mark subtree under /share as private. > 6. umount -l /share > > Well, someone please explain using short words, because I don't > understand at all. I am trying to re-construct Viro's thoughts. I think the steps outlined above; though not accurate, are still insightful. The idea is -- there is one master namespace, which has under /share, a replica of the mount tree of namespaces belonging to all users. for example if there are two users A and B, then in the master namespace under /share you will find /share/A and /share/B, each reflecting the mount tree for the namespaces belonging to user-A and user-B respectively. Note: /share is a shared mount-tree, which means it can propagate mount events. Everytime the user logs on the machine, a new namespace is created which is the clone of the master namespace. In this new namespace, the /share/$user is made the root of the namespace. Also if other users have allowed part of their namespace available to this user, than those mounts are also brought under this namespace. And finally the entire tree under /share is unmounted. Note, though multiple namespaces can exist simultaneously for the same user, the user is provided the illusion of per-process-namespace since all the namespaces look identical. I am trying to rewrite the steps outlined above, which may or may not reflect Viro's thoughts, but certainly reflect my reconstruction of viro's thoughts. 1. clone the master namespace. 2. in the new namespace move the tree under /share/$me to / for each ($user, $what, $how) { move /share/$user/$what to /$what if ($how == slave) { make the mount tree under /$what as slave } } 3. in the new namespace make the tree under /share as private and unmount /share RP > > Thanks, > Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-10 8:38 ` Ram Pai @ 2007-04-11 10:44 ` Miklos Szeredi [not found] ` <E1HbaJV-00059N-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-11 10:44 UTC (permalink / raw) To: linuxram Cc: serue, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel > 1. clone the master namespace. > > 2. in the new namespace > > move the tree under /share/$me to / > for each ($user, $what, $how) { > move /share/$user/$what to /$what > if ($how == slave) { > make the mount tree under /$what as slave > } > } > > 3. in the new namespace make the tree under > /share as private and unmount /share Thanks. I get the basic idea now: the namespace itself need not be shared between the sessions, it is enough if "share" propagation is set up between the different namespaces of a user. I don't yet see either in your or Viro's description how the trees under /share/$USER are initialized. I guess they are recursively bound from /, and are made slaves. Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <E1HbaJV-00059N-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <E1HbaJV-00059N-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> @ 2007-04-11 18:28 ` Ram Pai [not found] ` <1176316116.2811.39.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Ram Pai @ 2007-04-11 18:28 UTC (permalink / raw) To: Miklos Szeredi Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote: > > 1. clone the master namespace. > > > > 2. in the new namespace > > > > move the tree under /share/$me to / > > for each ($user, $what, $how) { > > move /share/$user/$what to /$what > > if ($how == slave) { > > make the mount tree under /$what as slave > > } > > } > > > > 3. in the new namespace make the tree under > > /share as private and unmount /share > > Thanks. I get the basic idea now: the namespace itself need not be > shared between the sessions, it is enough if "share" propagation is > set up between the different namespaces of a user. > > I don't yet see either in your or Viro's description how the trees > under /share/$USER are initialized. I guess they are recursively > bound from /, and are made slaves. yes. I suppose, when a userid is created one of the steps would be mount --rbind / /share/$USER mount --make-rslave /share/$USER mount --make-rshared /share/$USER RP > Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <1176316116.2811.39.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <1176316116.2811.39.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org> @ 2007-04-13 11:58 ` Miklos Szeredi [not found] ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> 2007-04-13 20:07 ` Karel Zak 0 siblings, 2 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-13 11:58 UTC (permalink / raw) To: linuxram-r/Jw6+rmf7HQT0dZR+AlfA Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA > On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote: > > > 1. clone the master namespace. > > > > > > 2. in the new namespace > > > > > > move the tree under /share/$me to / > > > for each ($user, $what, $how) { > > > move /share/$user/$what to /$what > > > if ($how == slave) { > > > make the mount tree under /$what as slave > > > } > > > } > > > > > > 3. in the new namespace make the tree under > > > /share as private and unmount /share > > > > Thanks. I get the basic idea now: the namespace itself need not be > > shared between the sessions, it is enough if "share" propagation is > > set up between the different namespaces of a user. > > > > I don't yet see either in your or Viro's description how the trees > > under /share/$USER are initialized. I guess they are recursively > > bound from /, and are made slaves. > > yes. I suppose, when a userid is created one of the steps would be > > mount --rbind / /share/$USER > mount --make-rslave /share/$USER > mount --make-rshared /share/$USER Thinking a bit more about this, I'm quite sure most users wouldn't even want private namespaces. It would be enough to chroot /share/$USER and be done with it. Private namespaces are only good for keeping a bunch of mounts referenced by a group of processes. But my guess is, that the natural behavior for users is to see a persistent set of mounts. If for example they mount something on a remote machine, then log out from the ssh session and later log back in, they would want to see their previous mount still there. Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> @ 2007-04-13 13:28 ` Serge E. Hallyn 2007-04-13 14:05 ` Miklos Szeredi 2007-04-16 7:59 ` Ram Pai 1 sibling, 1 reply; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-13 13:28 UTC (permalink / raw) To: Miklos Szeredi Cc: linuxram-r/Jw6+rmf7HQT0dZR+AlfA, serue-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org): > > On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote: > > > > 1. clone the master namespace. > > > > > > > > 2. in the new namespace > > > > > > > > move the tree under /share/$me to / > > > > for each ($user, $what, $how) { > > > > move /share/$user/$what to /$what > > > > if ($how == slave) { > > > > make the mount tree under /$what as slave > > > > } > > > > } > > > > > > > > 3. in the new namespace make the tree under > > > > /share as private and unmount /share > > > > > > Thanks. I get the basic idea now: the namespace itself need not be > > > shared between the sessions, it is enough if "share" propagation is > > > set up between the different namespaces of a user. > > > > > > I don't yet see either in your or Viro's description how the trees > > > under /share/$USER are initialized. I guess they are recursively > > > bound from /, and are made slaves. > > > > yes. I suppose, when a userid is created one of the steps would be > > > > mount --rbind / /share/$USER > > mount --make-rslave /share/$USER > > mount --make-rshared /share/$USER > > Thinking a bit more about this, I'm quite sure most users wouldn't > even want private namespaces. It would be enough to > > chroot /share/$USER > > and be done with it. > > Private namespaces are only good for keeping a bunch of mounts > referenced by a group of processes. But my guess is, that the natural > behavior for users is to see a persistent set of mounts. > > If for example they mount something on a remote machine, then log out > from the ssh session and later log back in, they would want to see > their previous mount still there. > > Miklos Agreed on desired behavior, but not on chroot sufficing. It actually sounds like you want exactly what was outlined in the OLS paper. Users still need to be in a different mounts namespace from the admin user so long as we consider the deluser and backup problems to be legitimate problems (well, so long as user mounts are allowed). So, when they log in, pam gives them a new namespace and chroots them into /share/$USER. Assuming I'm thinking clearly :) -serge ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-13 13:28 ` Serge E. Hallyn @ 2007-04-13 14:05 ` Miklos Szeredi 2007-04-13 21:44 ` Serge E. Hallyn [not found] ` <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> 0 siblings, 2 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-13 14:05 UTC (permalink / raw) To: serue Cc: linuxram, serue, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel > > Thinking a bit more about this, I'm quite sure most users wouldn't > > even want private namespaces. It would be enough to > > > > chroot /share/$USER > > > > and be done with it. > > > > Private namespaces are only good for keeping a bunch of mounts > > referenced by a group of processes. But my guess is, that the natural > > behavior for users is to see a persistent set of mounts. > > > > If for example they mount something on a remote machine, then log out > > from the ssh session and later log back in, they would want to see > > their previous mount still there. > > > > Miklos > > Agreed on desired behavior, but not on chroot sufficing. It actually > sounds like you want exactly what was outlined in the OLS paper. > > Users still need to be in a different mounts namespace from the admin > user so long as we consider the deluser and backup problems I don't think it matters, because /share/$USER duplicates a part or the whole of the user's namespace. So backup would have to be taught about /share anyway, and deluser operates on /home/$USER and not on /share/*, so there shouldn't be any problem. There's actually very little difference between rbind+chroot, and CLONE_NEWNS. In a private namespace: 1) when no more processes reference the namespace, the tree will be disbanded 2) the mount tree won't be accessible from outside the namespace Wanting a persistent namespace contradicts 1). Wanting a per-user (as opposed to per-session) namespace contradicts 2). The namespace _has_ to be accessible from outside, so that a new session can access/copy it. So both requirements point to the rbind/chroot solution. Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-13 14:05 ` Miklos Szeredi @ 2007-04-13 21:44 ` Serge E. Hallyn [not found] ` <20070413214415.GA28629-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org> [not found] ` <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> 1 sibling, 1 reply; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-13 21:44 UTC (permalink / raw) To: Miklos Szeredi Cc: serue, linuxram, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel Quoting Miklos Szeredi (miklos@szeredi.hu): > > > Thinking a bit more about this, I'm quite sure most users wouldn't > > > even want private namespaces. It would be enough to > > > > > > chroot /share/$USER > > > > > > and be done with it. > > > > > > Private namespaces are only good for keeping a bunch of mounts > > > referenced by a group of processes. But my guess is, that the natural > > > behavior for users is to see a persistent set of mounts. > > > > > > If for example they mount something on a remote machine, then log out > > > from the ssh session and later log back in, they would want to see > > > their previous mount still there. > > > > > > Miklos > > > > Agreed on desired behavior, but not on chroot sufficing. It actually > > sounds like you want exactly what was outlined in the OLS paper. > > > > Users still need to be in a different mounts namespace from the admin > > user so long as we consider the deluser and backup problems > > I don't think it matters, because /share/$USER duplicates a part or > the whole of the user's namespace. > > So backup would have to be taught about /share anyway, and deluser > operates on /home/$USER and not on /share/*, so there shouldn't be any > problem. In what I was thinking of, /share/$USER is bind mounted to ~$USER/share, so it would have to be done in a private namespace in order for deluser to not be tricked. > There's actually very little difference between rbind+chroot, and > CLONE_NEWNS. In a private namespace: > > 1) when no more processes reference the namespace, the tree will be > disbanded > > 2) the mount tree won't be accessible from outside the namespace But it *can* be, if properly set up. That's part of the point of the example in the OLS paper. When a user logs in, sshd clones a new namespace, then bind-mounts /share/$USER into ~$USER/share. So assuming that /share/$USER was --make-shared'd, it and ~$USER are now in the same peer group, and any changes made by the user under ~$USER will be reflected back into /share/$USER. > Wanting a persistent namespace contradicts 1). Not necessarily, see above. > Wanting a per-user (as opposed to per-session) namespace contradicts > 2). The namespace _has_ to be accessible from outside, so that a new > session can access/copy it. Again, I *think* you are wrong that private namespace contradicts this requirement. > So both requirements point to the rbind/chroot solution. It all points to a combination of the two :-) -serge ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <20070413214415.GA28629-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <20070413214415.GA28629-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org> @ 2007-04-15 20:39 ` Miklos Szeredi [not found] ` <E1HdBVc-0005pL-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-15 20:39 UTC (permalink / raw) To: serue-r/Jw6+rmf7HQT0dZR+AlfA Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, serue-r/Jw6+rmf7HQT0dZR+AlfA, linuxram-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA > > > Agreed on desired behavior, but not on chroot sufficing. It actually > > > sounds like you want exactly what was outlined in the OLS paper. > > > > > > Users still need to be in a different mounts namespace from the admin > > > user so long as we consider the deluser and backup problems > > > > I don't think it matters, because /share/$USER duplicates a part or > > the whole of the user's namespace. > > > > So backup would have to be taught about /share anyway, and deluser > > operates on /home/$USER and not on /share/*, so there shouldn't be any > > problem. > > In what I was thinking of, /share/$USER is bind mounted to > ~$USER/share, so it would have to be done in a private namespace in > order for deluser to not be tricked. But /share/$USER is surely not bind mounted to ~$USER/share in the _global_ namespace, is it? I can't see any sense in that. > > There's actually very little difference between rbind+chroot, and > > CLONE_NEWNS. In a private namespace: > > > > 1) when no more processes reference the namespace, the tree will be > > disbanded > > > > 2) the mount tree won't be accessible from outside the namespace > > But it *can* be, if properly set up. That's part of the point of the > example in the OLS paper. When a user logs in, sshd clones a new > namespace, then bind-mounts /share/$USER into ~$USER/share. So assuming > that /share/$USER was --make-shared'd, it and ~$USER are now in the > same peer group, and any changes made by the user under ~$USER will > be reflected back into /share/$USER. I acknowledge, that it can be done. My point was that it can be done more simply _without_ using CLONE_NS. > > Wanting a persistent namespace contradicts 1). > > Not necessarily, see above. > > > Wanting a per-user (as opposed to per-session) namespace contradicts > > 2). The namespace _has_ to be accessible from outside, so that a new > > session can access/copy it. > > Again, I *think* you are wrong that private namespace contradicts this > requirement. I'm not saying there's any contradiction, I'm saying rbind+chroot is a better fit. I haven't yet heard a single reason why a per-session namespace with parts shared per-user is better than just a per-user namespace. Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <E1HdBVc-0005pL-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <E1HdBVc-0005pL-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> @ 2007-04-16 1:11 ` Serge E. Hallyn 0 siblings, 0 replies; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-16 1:11 UTC (permalink / raw) To: Miklos Szeredi Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA, linuxram-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org): > > > > Agreed on desired behavior, but not on chroot sufficing. It actually > > > > sounds like you want exactly what was outlined in the OLS paper. > > > > > > > > Users still need to be in a different mounts namespace from the admin > > > > user so long as we consider the deluser and backup problems > > > > > > I don't think it matters, because /share/$USER duplicates a part or > > > the whole of the user's namespace. > > > > > > So backup would have to be taught about /share anyway, and deluser > > > operates on /home/$USER and not on /share/*, so there shouldn't be any > > > problem. > > > > In what I was thinking of, /share/$USER is bind mounted to > > ~$USER/share, so it would have to be done in a private namespace in > > order for deluser to not be tricked. > > But /share/$USER is surely not bind mounted to ~$USER/share in the > _global_ namespace, is it? I can't see any sense in that. No it's not, only in the private namespace. > > > There's actually very little difference between rbind+chroot, and > > > CLONE_NEWNS. In a private namespace: > > > > > > 1) when no more processes reference the namespace, the tree will be > > > disbanded > > > > > > 2) the mount tree won't be accessible from outside the namespace > > > > But it *can* be, if properly set up. That's part of the point of the > > example in the OLS paper. When a user logs in, sshd clones a new > > namespace, then bind-mounts /share/$USER into ~$USER/share. So assuming > > that /share/$USER was --make-shared'd, it and ~$USER are now in the > > same peer group, and any changes made by the user under ~$USER will > > be reflected back into /share/$USER. > > I acknowledge, that it can be done. My point was that it can be done > more simply _without_ using CLONE_NS. Seems like a matter of preference, but I see what you're saying. > > > Wanting a persistent namespace contradicts 1). > > > > Not necessarily, see above. > > > > > Wanting a per-user (as opposed to per-session) namespace contradicts > > > 2). The namespace _has_ to be accessible from outside, so that a new > > > session can access/copy it. > > > > Again, I *think* you are wrong that private namespace contradicts this > > requirement. > > I'm not saying there's any contradiction, I'm saying rbind+chroot is a > better fit. Ok, I see. > I haven't yet heard a single reason why a per-session namespace with > parts shared per-user is better than just a per-user namespace. In fact I suspect we could show that they are functionally equivalent (for your purposes) by drawing the fs tree and peer groups from current->fs->root on up for both methods. And not using private namespaces leaves the admin (at least for now) better able to diagnose the state of the system. -serge ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> @ 2007-04-16 8:18 ` Ram Pai [not found] ` <1176711509.9488.4.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Ram Pai @ 2007-04-16 8:18 UTC (permalink / raw) To: Miklos Szeredi Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Fri, 2007-04-13 at 16:05 +0200, Miklos Szeredi wrote: > > > Thinking a bit more about this, I'm quite sure most users wouldn't > > > even want private namespaces. It would be enough to > > > > > > chroot /share/$USER > > > > > > and be done with it. > > > > > > Private namespaces are only good for keeping a bunch of mounts > > > referenced by a group of processes. But my guess is, that the natural > > > behavior for users is to see a persistent set of mounts. > > > > > > If for example they mount something on a remote machine, then log out > > > from the ssh session and later log back in, they would want to see > > > their previous mount still there. > > > > > > Miklos > > > > Agreed on desired behavior, but not on chroot sufficing. It actually > > sounds like you want exactly what was outlined in the OLS paper. > > > > Users still need to be in a different mounts namespace from the admin > > user so long as we consider the deluser and backup problems > > I don't think it matters, because /share/$USER duplicates a part or > the whole of the user's namespace. > > So backup would have to be taught about /share anyway, and deluser > operates on /home/$USER and not on /share/*, so there shouldn't be any > problem. > > There's actually very little difference between rbind+chroot, and > CLONE_NEWNS. In a private namespace: > > 1) when no more processes reference the namespace, the tree will be > disbanded > > 2) the mount tree won't be accessible from outside the namespace > > Wanting a persistent namespace contradicts 1). > > Wanting a per-user (as opposed to per-session) namespace contradicts > 2). The namespace _has_ to be accessible from outside, so that a new > session can access/copy it. As i mentioned in the previous mail, disbanding all the namespaces of a user will not disband his mount tree, because a mirror of the mount tree still continues to exist in /share/$USER in the admin namespace. And a new user session can always use this copy to create a namespace that looks identical to that which existed earlier. > > So both requirements point to the rbind/chroot solution. Arn't there ways to escape chroot jails? Serge had pointed me to a URL which showed chroots can be escaped. And if that is true than having all user's private mount tree in the same namespace can be a security issue? RP > > Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <1176711509.9488.4.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <1176711509.9488.4.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org> @ 2007-04-16 9:27 ` Miklos Szeredi 2007-04-16 15:40 ` Eric W. Biederman 0 siblings, 1 reply; 54+ messages in thread From: Miklos Szeredi @ 2007-04-16 9:27 UTC (permalink / raw) To: linuxram-r/Jw6+rmf7HQT0dZR+AlfA Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA > Arn't there ways to escape chroot jails? Serge had pointed me to a URL > which showed chroots can be escaped. And if that is true than having all > user's private mount tree in the same namespace can be a security issue? No. In fact chrooting the user into /share/$USER will actually _grant_ a privilege to the user, instead of taking it away. It allows the user to modify it's root namespace, which it wouldn't be able to in the initial namespace. So even if the user could escape from the chroot (which I doubt), s/he would not be able to do any harm, since unprivileged mounting would be restricted to /share. Also /share/$USER should only have read/search permission for $USER or no permissions at all, which would mean, that other users' namespaces would be safe from tampering as well. Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-16 9:27 ` Miklos Szeredi @ 2007-04-16 15:40 ` Eric W. Biederman [not found] ` <m1d524l43w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org> 0 siblings, 1 reply; 54+ messages in thread From: Eric W. Biederman @ 2007-04-16 15:40 UTC (permalink / raw) To: Miklos Szeredi Cc: linuxram, containers, linux-fsdevel, akpm, util-linux-ng, linux-kernel Miklos Szeredi <miklos@szeredi.hu> writes: >> Arn't there ways to escape chroot jails? Serge had pointed me to a URL >> which showed chroots can be escaped. And if that is true than having all >> user's private mount tree in the same namespace can be a security issue? > > No. In fact chrooting the user into /share/$USER will actually > _grant_ a privilege to the user, instead of taking it away. It allows > the user to modify it's root namespace, which it wouldn't be able to > in the initial namespace. > > So even if the user could escape from the chroot (which I doubt), s/he > would not be able to do any harm, since unprivileged mounting would be > restricted to /share. Also /share/$USER should only have read/search > permission for $USER or no permissions at all, which would mean, that > other users' namespaces would be safe from tampering as well. A couple of points. - chroot can be escaped, it is just a chdir for the root directory it is not a security feature. The only security is that you have to be root to call chdir. A carefully done namespace setup won't have that issue. - While it may not violate security as far as what a user is allowed to modify it may violate security as far as what a user is allowed to see. There are interesting per login cases as well such as allowing a user to replicate their mount tree from another machine when they log in. When /home is on a network filesystem this can be very practical and can allow propagation of mounts across machines not just across a single login session. Eric ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <m1d524l43w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <m1d524l43w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org> @ 2007-04-16 15:55 ` Miklos Szeredi 0 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-16 15:55 UTC (permalink / raw) To: ebiederm-aS9lmoZGLiVWk0Htik3J/w Cc: linuxram-r/Jw6+rmf7HQT0dZR+AlfA, containers-qjLDD68F18O7TbgM5vRIOg, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA > >> Arn't there ways to escape chroot jails? Serge had pointed me to a URL > >> which showed chroots can be escaped. And if that is true than having all > >> user's private mount tree in the same namespace can be a security issue? > > > > No. In fact chrooting the user into /share/$USER will actually > > _grant_ a privilege to the user, instead of taking it away. It allows > > the user to modify it's root namespace, which it wouldn't be able to > > in the initial namespace. > > > > So even if the user could escape from the chroot (which I doubt), s/he > > would not be able to do any harm, since unprivileged mounting would be > > restricted to /share. Also /share/$USER should only have read/search > > permission for $USER or no permissions at all, which would mean, that > > other users' namespaces would be safe from tampering as well. > > A couple of points. > - chroot can be escaped, it is just a chdir for the root directory > it is not a security feature. The only security is that you have to > be root to call chdir. A carefully done namespace setup won't have > that issue. > > - While it may not violate security as far as what a user is allowed > to modify it may violate security as far as what a user is allowed > to see. I think that's just up to the permissions in the global namespace. In this example if you 'chmod 0 /share' there won't be anything for the user to see. > There are interesting per login cases as well such as allowing a > user to replicate their mount tree from another machine when they > log in. When /home is on a network filesystem this can be very > practical and can allow propagation of mounts across machines not > just across a single login session. Yeah, sounds interesting, but I think it's better to get the basics working first, and then we can start to think about the extras. Btw, there's nothing that prevents cloning the namespace _after_ chrooting into the per-user tree. That would still be simpler than doing it the other way round: first creating per-session namespaces and then setting up mount propagation between them. Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> 2007-04-13 13:28 ` Serge E. Hallyn @ 2007-04-16 7:59 ` Ram Pai 1 sibling, 0 replies; 54+ messages in thread From: Ram Pai @ 2007-04-16 7:59 UTC (permalink / raw) To: Miklos Szeredi Cc: serue-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Fri, 2007-04-13 at 13:58 +0200, Miklos Szeredi wrote: > > On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote: > > > > 1. clone the master namespace. > > > > > > > > 2. in the new namespace > > > > > > > > move the tree under /share/$me to / > > > > for each ($user, $what, $how) { > > > > move /share/$user/$what to /$what > > > > if ($how == slave) { > > > > make the mount tree under /$what as slave > > > > } > > > > } > > > > > > > > 3. in the new namespace make the tree under > > > > /share as private and unmount /share > > > > > > Thanks. I get the basic idea now: the namespace itself need not be > > > shared between the sessions, it is enough if "share" propagation is > > > set up between the different namespaces of a user. > > > > > > I don't yet see either in your or Viro's description how the trees > > > under /share/$USER are initialized. I guess they are recursively > > > bound from /, and are made slaves. > > > > yes. I suppose, when a userid is created one of the steps would be > > > > mount --rbind / /share/$USER > > mount --make-rslave /share/$USER > > mount --make-rshared /share/$USER > > Thinking a bit more about this, I'm quite sure most users wouldn't > even want private namespaces. It would be enough to > > chroot /share/$USER > > and be done with it. > > Private namespaces are only good for keeping a bunch of mounts > referenced by a group of processes. But my guess is, that the natural > behavior for users is to see a persistent set of mounts. > > If for example they mount something on a remote machine, then log out > from the ssh session and later log back in, they would want to see > their previous mount still there. They will continue see their previous mount tree. Even if all the namespaces belonging to the different sessions of the user get dismantled when all the sessions exit, the a mirror of those mount trees continue to exist under /share/$USER in the original namespace. So I don't think we have a issue. NOTE: when I say 'original namespace' I mean the admin namespace; the first namespace that gets created when the machine boots. RP > > Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-13 11:58 ` Miklos Szeredi [not found] ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org> @ 2007-04-13 20:07 ` Karel Zak [not found] ` <20070413200720.GS31445-CxBs/XhZ2BtHjqfyn1fVYA@public.gmane.org> 1 sibling, 1 reply; 54+ messages in thread From: Karel Zak @ 2007-04-13 20:07 UTC (permalink / raw) To: Miklos Szeredi Cc: linuxram, serue, akpm, linux-fsdevel, containers, util-linux-ng, linux-kernel On Fri, Apr 13, 2007 at 01:58:59PM +0200, Miklos Szeredi wrote: > > On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote: > > > > 1. clone the master namespace. > > > > > > > > 2. in the new namespace > > > > > > > > move the tree under /share/$me to / > > > > for each ($user, $what, $how) { > > > > move /share/$user/$what to /$what > > > > if ($how == slave) { > > > > make the mount tree under /$what as slave > > > > } > > > > } > > > > > > > > 3. in the new namespace make the tree under > > > > /share as private and unmount /share > > > > > > Thanks. I get the basic idea now: the namespace itself need not be > > > shared between the sessions, it is enough if "share" propagation is > > > set up between the different namespaces of a user. > > > > > > I don't yet see either in your or Viro's description how the trees > > > under /share/$USER are initialized. I guess they are recursively > > > bound from /, and are made slaves. > > > > yes. I suppose, when a userid is created one of the steps would be > > > > mount --rbind / /share/$USER > > mount --make-rslave /share/$USER > > mount --make-rshared /share/$USER > > Thinking a bit more about this, I'm quite sure most users wouldn't > even want private namespaces. It would be enough to > > chroot /share/$USER > > and be done with it. I don't think so. How to you want to implement non-shared /tmp directories? The chroot is overkill in this case. See: http://www.coker.com.au/selinux/talks/sage-2006/PolyInstantiatedDirectories.html http://danwalsh.livejournal.com/ > Private namespaces are only good for keeping a bunch of mounts > referenced by a group of processes. But my guess is, that the natural > behavior for users is to see a persistent set of mounts. > > If for example they mount something on a remote machine, then log out > from the ssh session and later log back in, they would want to see > their previous mount still there. They can mount to /mnt where the directory is shared ("mount --make-shared /mnt") and visible and all namespaces. I think /share/$USER is an extreme example. You can found more situations when private namespaces are nice solution. Karel -- Karel Zak <kzak@redhat.com> ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <20070413200720.GS31445-CxBs/XhZ2BtHjqfyn1fVYA@public.gmane.org>]
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <20070413200720.GS31445-CxBs/XhZ2BtHjqfyn1fVYA@public.gmane.org> @ 2007-04-15 20:21 ` Miklos Szeredi 0 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-15 20:21 UTC (permalink / raw) To: kzak-H+wXaHxf7aLQT0dZR+AlfA Cc: linuxram-r/Jw6+rmf7HQT0dZR+AlfA, serue-r/Jw6+rmf7HQT0dZR+AlfA, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg, util-linux-ng-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA > > Thinking a bit more about this, I'm quite sure most users wouldn't > > even want private namespaces. It would be enough to > > > > chroot /share/$USER > > > > and be done with it. > > I don't think so. How to you want to implement non-shared /tmp > directories? mount --bind /.tmp/$USER /share/$USER/tmp or whatever else this polyunsaturated thingy does within the cloned namespace. > The chroot is overkill in this case. What do you mean it's an overkill? clone(CLONE_NS) duplicates all the mounts, just as mount --rbind does. > > Private namespaces are only good for keeping a bunch of mounts > > referenced by a group of processes. But my guess is, that the natural > > behavior for users is to see a persistent set of mounts. > > > > If for example they mount something on a remote machine, then log out > > from the ssh session and later log back in, they would want to see > > their previous mount still there. > > They can mount to /mnt where the directory is shared ("mount > --make-shared /mnt") and visible and all namespaces. > > I think /share/$USER is an extreme example. You can found more > situations when private namespaces are nice solution. Private to a single login session? I'd like to hear examples. Thanks, Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall [not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> 2007-04-06 23:02 ` Andrew Morton @ 2007-04-09 22:00 ` Serge E. Hallyn 2007-04-11 10:32 ` Miklos Szeredi 1 sibling, 1 reply; 54+ messages in thread From: Serge E. Hallyn @ 2007-04-09 22:00 UTC (permalink / raw) To: Miklos Szeredi Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, util-linux-ng-u79uwXL29TY76Z2rM5mHXA Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org): > This patchset adds support for keeping mount ownership information in > the kernel, and allow unprivileged mount(2) and umount(2) in certain > cases. Well, I'd like to feel all smart and point out some bugs, but the code all reads very nicely, seems to work as advertised, and while I won't have ltp results until tomorrow, boot test results in so far are all successful. Looks good. -serge > This can be useful for the following reasons: > > - mount(8) can store ownership ("user=XY" option) in the kernel > instead, or in addition to storing it in /etc/mtab. For example if > private namespaces are used with mount propagations /etc/mtab > becomes unworkable, but using /proc/mounts works fine > > - fuse won't need a special suid-root mount/umount utility. Plain > umount(8) can easily be made to work with unprivileged fuse mounts > > - users can use bind mounts without having to pre-configure them in > /etc/fstab > > All this is done in a secure way, and unprivileged bind and fuse > mounts are disabled by default and can be enabled through sysctl or > /proc/sys. > > One thing that is missing from this series is the ability to restrict > user mounts to private namespaces. The reason is that private > namespaces have still not gained the momentum and support needed for > painless user experience. So such a feature would not yet get enough > attention and testing. However adding such an optional restriction > can be done with minimal changes in the future, once private > namespaces have matured. > > An earlier version of these patches have been discussed here: > > http://lkml.org/lkml/2005/5/3/64 > > -- > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [patch 0/8] unprivileged mount syscall 2007-04-09 22:00 ` Serge E. Hallyn @ 2007-04-11 10:32 ` Miklos Szeredi 0 siblings, 0 replies; 54+ messages in thread From: Miklos Szeredi @ 2007-04-11 10:32 UTC (permalink / raw) To: serue; +Cc: akpm, linux-fsdevel, util-linux-ng > > This patchset adds support for keeping mount ownership information in > > the kernel, and allow unprivileged mount(2) and umount(2) in certain > > cases. > > Well, I'd like to feel all smart and point out some bugs, but the code > all reads very nicely, seems to work as advertised, and while I won't > have ltp results until tomorrow, boot test results in so far are all > successful. > > Looks good. Thanks for the review and testing! Miklos ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2007-04-16 15:55 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-04 18:30 [patch 0/8] unprivileged mount syscall Miklos Szeredi
2007-04-04 18:30 ` [patch 1/8] add user mounts to the kernel Miklos Szeredi
2007-04-04 18:30 ` [patch 2/8] allow unprivileged umount Miklos Szeredi
2007-04-04 18:30 ` [patch 3/8] account user mounts Miklos Szeredi
2007-04-04 18:30 ` [patch 4/8] propagate error values from clone_mnt Miklos Szeredi
2007-04-04 18:30 ` [patch 5/8] allow unprivileged bind mounts Miklos Szeredi
2007-04-04 18:30 ` [patch 6/8] put declaration of put_filesystem() in fs.h Miklos Szeredi
2007-04-04 18:30 ` [patch 7/8] allow unprivileged mounts Miklos Szeredi
2007-04-04 18:30 ` [patch 8/8] allow unprivileged fuse mounts Miklos Szeredi
2007-04-09 18:57 ` [patch 0/8] unprivileged mount syscall Serge E. Hallyn
2007-04-09 20:14 ` Miklos Szeredi
2007-04-09 20:55 ` Serge E. Hallyn
[not found] ` <20070409205506.GC20226-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-04-11 19:43 ` Miklos Szeredi
[not found] ` <E1Hbiih-00060L-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-11 20:05 ` Serge E. Hallyn
2007-04-11 20:41 ` Miklos Szeredi
2007-04-11 20:57 ` Serge E. Hallyn
[not found] ` <20070404183012.429274832-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
2007-04-06 23:02 ` Andrew Morton
2007-04-06 23:16 ` H. Peter Anvin
2007-04-06 23:55 ` Jan Engelhardt
2007-04-07 0:22 ` H. Peter Anvin
2007-04-07 3:40 ` Eric Van Hensbergen
[not found] ` <a4e6962a0704062040q12c0013ek9591b9fbb27caa12-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-04-07 6:48 ` Miklos Szeredi
2007-04-10 8:52 ` Ian Kent
[not found] ` <1176195125.3476.47.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
2007-04-11 10:48 ` Miklos Szeredi
2007-04-11 13:48 ` Ian Kent
[not found] ` <1176299311.3377.6.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
2007-04-11 14:26 ` Serge E. Hallyn
[not found] ` <20070411142608.GC30460-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-04-11 14:27 ` Ian Kent
[not found] ` <1176301632.3377.9.camel-J+SFD3YVfrQ/gntp4R1GGQ@public.gmane.org>
2007-04-11 14:45 ` Serge E. Hallyn
2007-04-07 6:41 ` Miklos Szeredi
[not found] ` <E1Ha4cN-0004rc-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-09 14:38 ` Serge E. Hallyn
[not found] ` <20070409143802.GB4891-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-04-09 16:24 ` Miklos Szeredi
2007-04-09 17:07 ` Serge E. Hallyn
2007-04-09 17:46 ` Ram Pai
2007-04-09 18:25 ` H. Peter Anvin
2007-04-10 10:33 ` Karel Zak
2007-04-09 20:10 ` Miklos Szeredi
2007-04-10 8:38 ` Ram Pai
2007-04-11 10:44 ` Miklos Szeredi
[not found] ` <E1HbaJV-00059N-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-11 18:28 ` Ram Pai
[not found] ` <1176316116.2811.39.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>
2007-04-13 11:58 ` Miklos Szeredi
[not found] ` <E1HcKQd-0001yO-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-13 13:28 ` Serge E. Hallyn
2007-04-13 14:05 ` Miklos Szeredi
2007-04-13 21:44 ` Serge E. Hallyn
[not found] ` <20070413214415.GA28629-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-04-15 20:39 ` Miklos Szeredi
[not found] ` <E1HdBVc-0005pL-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-16 1:11 ` Serge E. Hallyn
[not found] ` <E1HcMOq-0002As-00-VFwzv6uONVrxNFs70CDYszOMxtEWgIxa@public.gmane.org>
2007-04-16 8:18 ` Ram Pai
[not found] ` <1176711509.9488.4.camel-kj2lFfaA5cHMbYB6QlFGEg@public.gmane.org>
2007-04-16 9:27 ` Miklos Szeredi
2007-04-16 15:40 ` Eric W. Biederman
[not found] ` <m1d524l43w.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-04-16 15:55 ` Miklos Szeredi
2007-04-16 7:59 ` Ram Pai
2007-04-13 20:07 ` Karel Zak
[not found] ` <20070413200720.GS31445-CxBs/XhZ2BtHjqfyn1fVYA@public.gmane.org>
2007-04-15 20:21 ` Miklos Szeredi
2007-04-09 22:00 ` Serge E. Hallyn
2007-04-11 10:32 ` Miklos Szeredi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox