* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces [not found] ` <1461699046-30485-4-git-send-email-seth.forshee@canonical.com> @ 2016-05-17 22:39 ` Eric W. Biederman 2016-05-17 23:58 ` Seth Forshee 0 siblings, 1 reply; 6+ messages in thread From: Eric W. Biederman @ 2016-05-17 22:39 UTC (permalink / raw) To: Seth Forshee Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields, Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi, Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel, linux-security-module, selinux, cgroups Seth Forshee <seth.forshee@canonical.com> writes: > Both of these filesystems already have use cases for mounting the > same super block from multiple user namespaces. For sysfs this > happens when using criu for snapshotting a container, where sysfs > is mnounted in the containers network ns but the hosts user ns. > The cgroup filesystem shares the same super block for all mounts > of the same hierarchy regardless of the namespace. > > As a result, the restriction on mounting a super block from a > single user namespace creates regressions for existing uses of > these filesystems. For these specific filesystems this > restriction isn't really necessary since the backing store is > objects in kernel memory and thus the ids assigned from inodes > is not subject to translation relative to s_user_ns. > > Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set > causes sget_userns() to skip the check of s_user_ns. Set this > flag for the sysfs and cgroup filesystems to fix the > regressions. So this one needs to be sget_userns(..., &init_user_ns, ...). And not a new special case. Apologies for not catching this earlier. I am looking at folding all of this into the patch that introduces sget_userns so that even bisects won't have regresssions. We loose the ability to call mount -o remount and actually affect these filesystems (which we don't have without s_user_ns) but we gain a whole lot of simplicity, and we don't break the xattr and security label on sysfs code. Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces 2016-05-17 22:39 ` [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces Eric W. Biederman @ 2016-05-17 23:58 ` Seth Forshee 2016-05-18 15:45 ` Eric W. Biederman 0 siblings, 1 reply; 6+ messages in thread From: Seth Forshee @ 2016-05-17 23:58 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields, Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi, Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel, linux-security-module, selinux, cgroups On Tue, May 17, 2016 at 05:39:33PM -0500, Eric W. Biederman wrote: > Seth Forshee <seth.forshee@canonical.com> writes: > > > Both of these filesystems already have use cases for mounting the > > same super block from multiple user namespaces. For sysfs this > > happens when using criu for snapshotting a container, where sysfs > > is mnounted in the containers network ns but the hosts user ns. > > The cgroup filesystem shares the same super block for all mounts > > of the same hierarchy regardless of the namespace. > > > > As a result, the restriction on mounting a super block from a > > single user namespace creates regressions for existing uses of > > these filesystems. For these specific filesystems this > > restriction isn't really necessary since the backing store is > > objects in kernel memory and thus the ids assigned from inodes > > is not subject to translation relative to s_user_ns. > > > > Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set > > causes sget_userns() to skip the check of s_user_ns. Set this > > flag for the sysfs and cgroup filesystems to fix the > > regressions. > > So this one needs to be sget_userns(..., &init_user_ns, ...). > And not a new special case. This is actually what I wanted to do, but based on a previous discussion where I had suggested doing this (for a different reason) I came away thinking you did not want it that way. So I'm happy with that change. But if we do that it violates some of the assumptions of the patch to rework MNT_NODEV on your testing branch (and also those behind patch 2 in this series). Something will need to be changed there to prevent a regression in mount behavior when a user ns tries to mount without MNT_NODEV when the mount inherited from its parent has it set. > Apologies for not catching this earlier. Actually this is a more recent patch, so you possibly hadn't seen it before. > I am looking at folding all of this into the patch that introduces > sget_userns so that even bisects won't have regresssions. That's fine with me. Thanks, Seth ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces 2016-05-17 23:58 ` Seth Forshee @ 2016-05-18 15:45 ` Eric W. Biederman 2016-05-18 16:16 ` Seth Forshee 0 siblings, 1 reply; 6+ messages in thread From: Eric W. Biederman @ 2016-05-18 15:45 UTC (permalink / raw) To: Seth Forshee Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields, Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi, Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel, linux-security-module, selinux, cgroups Seth Forshee <seth.forshee@canonical.com> writes: > On Tue, May 17, 2016 at 05:39:33PM -0500, Eric W. Biederman wrote: >> Seth Forshee <seth.forshee@canonical.com> writes: >> >> > Both of these filesystems already have use cases for mounting the >> > same super block from multiple user namespaces. For sysfs this >> > happens when using criu for snapshotting a container, where sysfs >> > is mnounted in the containers network ns but the hosts user ns. >> > The cgroup filesystem shares the same super block for all mounts >> > of the same hierarchy regardless of the namespace. >> > >> > As a result, the restriction on mounting a super block from a >> > single user namespace creates regressions for existing uses of >> > these filesystems. For these specific filesystems this >> > restriction isn't really necessary since the backing store is >> > objects in kernel memory and thus the ids assigned from inodes >> > is not subject to translation relative to s_user_ns. >> > >> > Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set >> > causes sget_userns() to skip the check of s_user_ns. Set this >> > flag for the sysfs and cgroup filesystems to fix the >> > regressions. >> >> So this one needs to be sget_userns(..., &init_user_ns, ...). >> And not a new special case. > > This is actually what I wanted to do, but based on a previous discussion > where I had suggested doing this (for a different reason) I came away > thinking you did not want it that way. So I'm happy with that change. Yeah. Somedays it seems like there are a lot of pieces in play here. The security labels on sysfs seems to be a very compelling case. > But if we do that it violates some of the assumptions of the patch to > rework MNT_NODEV on your testing branch (and also those behind patch 2 > in this series). Something will need to be changed there to prevent a > regression in mount behavior when a user ns tries to mount without > MNT_NODEV when the mount inherited from its parent has it set. Thank you for pointing that out. I will look into that. I believe I know exactly what you are talking about. Of the choices I think it is better to a minor localized change in the fs_fully_visible logic than it is to cause problems elsewhere. >> Apologies for not catching this earlier. > > Actually this is a more recent patch, so you possibly hadn't seen it > before. > >> I am looking at folding all of this into the patch that introduces >> sget_userns so that even bisects won't have regresssions. > > That's fine with me. And thank you for keeping everything as separate patches. That is at least helping me catch up. Even if I don't agree that these things should be separate come merge time. Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces 2016-05-18 15:45 ` Eric W. Biederman @ 2016-05-18 16:16 ` Seth Forshee 2016-05-18 16:27 ` Eric W. Biederman 0 siblings, 1 reply; 6+ messages in thread From: Seth Forshee @ 2016-05-18 16:16 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields, Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi, Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel, linux-security-module, selinux, cgroups On Wed, May 18, 2016 at 10:45:31AM -0500, Eric W. Biederman wrote: > > But if we do that it violates some of the assumptions of the patch to > > rework MNT_NODEV on your testing branch (and also those behind patch 2 > > in this series). Something will need to be changed there to prevent a > > regression in mount behavior when a user ns tries to mount without > > MNT_NODEV when the mount inherited from its parent has it set. > > Thank you for pointing that out. I will look into that. > > I believe I know exactly what you are talking about. Of the choices I > think it is better to a minor localized change in the fs_fully_visible > logic than it is to cause problems elsewhere. Agreed. > >> Apologies for not catching this earlier. > > > > Actually this is a more recent patch, so you possibly hadn't seen it > > before. > > > >> I am looking at folding all of this into the patch that introduces > >> sget_userns so that even bisects won't have regresssions. > > > > That's fine with me. > > And thank you for keeping everything as separate patches. That is at > least helping me catch up. Even if I don't agree that these things > should be separate come merge time. Honestly I probably would have squashed some of them into that first patch myself if you hadn't already applied it to your testing branch, so that's all just luck. Keep in mind that I also have that patch for mqueue that isn't in this series, and I haven't yet checked to see if the 4.7 merges introduce anything which is going to require updating these patches. I was planning to wait and send out updates after -rc1, but if you want that stuff sooner just let me know. Thanks, Seth ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces 2016-05-18 16:16 ` Seth Forshee @ 2016-05-18 16:27 ` Eric W. Biederman 0 siblings, 0 replies; 6+ messages in thread From: Eric W. Biederman @ 2016-05-18 16:27 UTC (permalink / raw) To: Seth Forshee Cc: Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields, Tejun Heo, Li Zefan, Johannes Weiner, Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi, Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel, linux-security-module, selinux, cgroups Seth Forshee <seth.forshee@canonical.com> writes: > On Wed, May 18, 2016 at 10:45:31AM -0500, Eric W. Biederman wrote: >> > But if we do that it violates some of the assumptions of the patch to >> > rework MNT_NODEV on your testing branch (and also those behind patch 2 >> > in this series). Something will need to be changed there to prevent a >> > regression in mount behavior when a user ns tries to mount without >> > MNT_NODEV when the mount inherited from its parent has it set. >> >> Thank you for pointing that out. I will look into that. >> >> I believe I know exactly what you are talking about. Of the choices I >> think it is better to a minor localized change in the fs_fully_visible >> logic than it is to cause problems elsewhere. > > Agreed. > >> >> Apologies for not catching this earlier. >> > >> > Actually this is a more recent patch, so you possibly hadn't seen it >> > before. >> > >> >> I am looking at folding all of this into the patch that introduces >> >> sget_userns so that even bisects won't have regresssions. >> > >> > That's fine with me. >> >> And thank you for keeping everything as separate patches. That is at >> least helping me catch up. Even if I don't agree that these things >> should be separate come merge time. > > Honestly I probably would have squashed some of them into that first > patch myself if you hadn't already applied it to your testing branch, so > that's all just luck. > > Keep in mind that I also have that patch for mqueue that isn't in this > series, and I haven't yet checked to see if the 4.7 merges introduce > anything which is going to require updating these patches. I was > planning to wait and send out updates after -rc1, but if you want that > stuff sooner just let me know. As unfortunately I don't have anything going into -rc1 I am working on this right now. Let me finish sorting out the sget_userns and mnt nodev mess and I will push something out and then we can compare notes. I think I have mqueue covered by other changes. As it is in the set of filesystems that should just use sget_userns. I am sorting through the nodev corner of this now. It should just be a day or two. Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v4 00/21] Support fuse mounts in user namespaces
@ 2016-04-26 19:36 Seth Forshee
2016-04-26 19:36 ` [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between " Seth Forshee
0 siblings, 1 reply; 6+ messages in thread
From: Seth Forshee @ 2016-04-26 19:36 UTC (permalink / raw)
To: Eric W. Biederman, linux-bcache, dm-devel, linux-raid, linux-mtd,
linux-fsdevel, fuse-devel, cgroups, linux-security-module,
selinux
Cc: Alexander Viro, Serge Hallyn, Richard Weinberger,
Austin S Hemmelgarn, Miklos Szeredi, Pavel Tikhomirov,
linux-kernel, Seth Forshee
Hi Eric,
Here's another update to my patches for mouning with fuse from
unpivileged user namespaces. The main change here is a fix for a build
failure when fuse is built as a module. As usual the series is also
available at:
git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/linux.git fuse-userns
Changes since v3:
* Export current_in_userns() to fix an error when fuse is built as a
module.
* Add comment explaining the conditions for allowing CAP_CHOWN in
s_user_ns to change the owner or group of an inode.
* Added acks from Serge.
Thanks,
Seth
---
Andy Lutomirski (1):
fs: Treat foreign mounts as nosuid
Pavel Tikhomirov (1):
fs: fix a posible leak of allocated superblock
Seth Forshee (19):
fs: Remove check of s_user_ns for existing mounts in
fs_fully_visible()
fs: Allow sysfs and cgroupfs to share super blocks between user
namespaces
block_dev: Support checking inode permissions in lookup_bdev()
block_dev: Check permissions towards block device inode when mounting
selinux: Add support for unprivileged mounts from user namespaces
userns: Replace in_userns with current_in_userns
Smack: Handle labels consistently in untrusted mounts
fs: Check for invalid i_uid in may_follow_link()
cred: Reject inodes with invalid ids in set_create_file_as()
fs: Refuse uid/gid changes which don't map into s_user_ns
fs: Update posix_acl support to handle user namespace mounts
fs: Allow superblock owner to change ownership of inodes with
unmappable ids
fs: Don't remove suid for CAP_FSETID in s_user_ns
fs: Allow superblock owner to access do_remount_sb()
capabilities: Allow privileged user in s_user_ns to set security.*
xattrs
fuse: Add support for pid namespaces
fuse: Support fuse filesystems outside of init_user_ns
fuse: Restrict allow_other to the superblock's namespace or a
descendant
fuse: Allow user namespace mounts
drivers/md/bcache/super.c | 2 +-
drivers/md/dm-table.c | 2 +-
drivers/mtd/mtdsuper.c | 2 +-
fs/attr.c | 73 ++++++++++++++++++++++++++++++++++++-----
fs/block_dev.c | 18 ++++++++--
fs/exec.c | 2 +-
fs/fuse/cuse.c | 3 +-
fs/fuse/dev.c | 26 +++++++++++----
fs/fuse/dir.c | 16 ++++-----
fs/fuse/file.c | 22 ++++++++++---
fs/fuse/fuse_i.h | 10 +++++-
fs/fuse/inode.c | 40 ++++++++++++++--------
fs/inode.c | 3 +-
fs/kernfs/inode.c | 2 ++
fs/namei.c | 2 +-
fs/namespace.c | 20 ++++++++---
fs/posix_acl.c | 67 +++++++++++++++++++++++--------------
fs/proc/base.c | 2 ++
fs/proc/generic.c | 3 ++
fs/proc/proc_sysctl.c | 2 ++
fs/quota/quota.c | 2 +-
fs/super.c | 7 +++-
fs/sysfs/mount.c | 3 +-
fs/xattr.c | 19 ++++++++---
include/linux/fs.h | 3 +-
include/linux/mount.h | 1 +
include/linux/posix_acl_xattr.h | 17 +++++++---
include/linux/uidgid.h | 10 ++++++
include/linux/user_namespace.h | 6 ++--
kernel/cgroup.c | 4 +--
kernel/cred.c | 2 ++
kernel/user_namespace.c | 7 ++--
security/commoncap.c | 22 +++++++++----
security/selinux/hooks.c | 25 +++++++++++++-
security/smack/smack_lsm.c | 29 ++++++++++------
35 files changed, 355 insertions(+), 119 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces 2016-04-26 19:36 [PATCH v4 00/21] Support fuse mounts in " Seth Forshee @ 2016-04-26 19:36 ` Seth Forshee 0 siblings, 0 replies; 6+ messages in thread From: Seth Forshee @ 2016-04-26 19:36 UTC (permalink / raw) To: Eric W. Biederman, Alexander Viro, Greg Kroah-Hartman, Jeff Layton, J. Bruce Fields, Tejun Heo, Li Zefan, Johannes Weiner Cc: Serge Hallyn, Richard Weinberger, Austin S Hemmelgarn, Miklos Szeredi, Pavel Tikhomirov, linux-kernel, linux-bcache, dm-devel, linux-raid, linux-mtd, linux-fsdevel, fuse-devel, linux-security-module, selinux, cgroups, Seth Forshee Both of these filesystems already have use cases for mounting the same super block from multiple user namespaces. For sysfs this happens when using criu for snapshotting a container, where sysfs is mounted in the containers network ns but the hosts user ns. The cgroup filesystem shares the same super block for all mounts of the same hierarchy regardless of the namespace. As a result, the restriction on mounting a super block from a single user namespace creates regressions for existing uses of these filesystems. For these specific filesystems this restriction isn't really necessary since the backing store is objects in kernel memory and thus the ids assigned from inodes is not subject to translation relative to s_user_ns. Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set causes sget_userns() to skip the check of s_user_ns. Set this flag for the sysfs and cgroup filesystems to fix the regressions. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> --- fs/super.c | 3 ++- fs/sysfs/mount.c | 3 ++- include/linux/fs.h | 1 + kernel/cgroup.c | 4 ++-- 4 files changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/super.c b/fs/super.c index 092a7828442e..ead156b44bf8 100644 --- a/fs/super.c +++ b/fs/super.c @@ -472,7 +472,8 @@ retry: hlist_for_each_entry(old, &type->fs_supers, s_instances) { if (!test(old, data)) continue; - if (user_ns != old->s_user_ns) { + if (!(type->fs_flags & FS_USERNS_SHARE_SB) && + user_ns != old->s_user_ns) { spin_unlock(&sb_lock); if (s) { up_write(&s->s_umount); diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c index f3db82071cfb..9555accd4322 100644 --- a/fs/sysfs/mount.c +++ b/fs/sysfs/mount.c @@ -59,7 +59,8 @@ static struct file_system_type sysfs_fs_type = { .name = "sysfs", .mount = sysfs_mount, .kill_sb = sysfs_kill_sb, - .fs_flags = FS_USERNS_VISIBLE | FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_VISIBLE | FS_USERNS_MOUNT | + FS_USERNS_SHARE_SB, }; int __init sysfs_init(void) diff --git a/include/linux/fs.h b/include/linux/fs.h index be0f8023e28c..66a639ec1bc4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1988,6 +1988,7 @@ struct file_system_type { #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ #define FS_USERNS_DEV_MOUNT 16 /* A userns mount does not imply MNT_NODEV */ #define FS_USERNS_VISIBLE 32 /* FS must already be visible */ +#define FS_USERNS_SHARE_SB 64 /* Allow sharing sb between userns-es */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ struct dentry *(*mount) (struct file_system_type *, int, const char *, void *); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 671dc05c0b0f..9c9aa27e531a 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2247,14 +2247,14 @@ static struct file_system_type cgroup_fs_type = { .name = "cgroup", .mount = cgroup_mount, .kill_sb = cgroup_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_SHARE_SB, }; static struct file_system_type cgroup2_fs_type = { .name = "cgroup2", .mount = cgroup_mount, .kill_sb = cgroup_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_SHARE_SB, }; static char *cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen, -- 2.7.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-05-18 16:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1461699046-30485-1-git-send-email-seth.forshee@canonical.com>
[not found] ` <1461699046-30485-4-git-send-email-seth.forshee@canonical.com>
2016-05-17 22:39 ` [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces Eric W. Biederman
2016-05-17 23:58 ` Seth Forshee
2016-05-18 15:45 ` Eric W. Biederman
2016-05-18 16:16 ` Seth Forshee
2016-05-18 16:27 ` Eric W. Biederman
2016-04-26 19:36 [PATCH v4 00/21] Support fuse mounts in " Seth Forshee
2016-04-26 19:36 ` [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between " Seth Forshee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox