* [PATCH v1 0/3] cgroup: allow for unprivileged management @ 2016-07-18 16:18 Aleksa Sarai 2016-07-18 16:18 ` [PATCH v1 1/3] kernfs: add support for custom per-sb permission hooks Aleksa Sarai [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org> 0 siblings, 2 replies; 34+ messages in thread From: Aleksa Sarai @ 2016-07-18 16:18 UTC (permalink / raw) To: Greg Kroah-Hartman, Tejun Heo, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson Cc: linux-kernel, cgroups, Christian Brauner, Aleksa Sarai, dev This is a rewrite of my old cgroup unprivileged subtree management[1] patchset. Rather than magically creating a new cgroup, I've instead modified kernfs so that we can have custom permission hooks. The following only applies to cgroupv2 trees, due to the fact that cgroupv1 doesn't explicitly require that cgroups be hierarchical. You can only create a new subtree if you either would traditionally have write access, or you are attempting to create a new cgroup under the root cgroup of your current cgroup namespace (and you have CAP_SYS_ADMIN in the user namespace pinned by the cgroup namespace). This means that users would only be able to create sub-cgroups of their current cgroup using this method. In addition, I relaxed one of the ancestor restrictions so that you can move to direct descendants of the current cgroup without needing to be able to join the current cgroup you're in (because that restriction doesn't make much sense). [1]: http://marc.info/?l=linux-kernel&m=146319604331859 Cc: dev@opencontainers.org Aleksa Sarai (3): kernfs: add support for custom per-sb permission hooks cgroup: allow for unprivileged subtree management cgroup: relax common ancestor restriction for direct descendants fs/kernfs/inode.c | 13 +++++++- include/linux/kernfs.h | 3 ++ kernel/cgroup.c | 86 +++++++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 93 insertions(+), 9 deletions(-) -- 2.9.0 ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v1 1/3] kernfs: add support for custom per-sb permission hooks 2016-07-18 16:18 [PATCH v1 0/3] cgroup: allow for unprivileged management Aleksa Sarai @ 2016-07-18 16:18 ` Aleksa Sarai [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org> 1 sibling, 0 replies; 34+ messages in thread From: Aleksa Sarai @ 2016-07-18 16:18 UTC (permalink / raw) To: Greg Kroah-Hartman, Tejun Heo, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson Cc: linux-kernel, cgroups, Christian Brauner, Aleksa Sarai, dev This allows for users of kernfs to create custom (and possibly less restrictive) permission checks for kernfs_nodes. The default is unchanged. This patch is part of the cgroupns unprivileged subtree management patchset. Cc: dev@opencontainers.org Signed-off-by: Aleksa Sarai <asarai@suse.de> --- fs/kernfs/inode.c | 13 ++++++++++++- include/linux/kernfs.h | 3 +++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index 63b925d5ba1e..e82b8e5aa643 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -364,15 +364,26 @@ void kernfs_evict_inode(struct inode *inode) int kernfs_iop_permission(struct inode *inode, int mask) { struct kernfs_node *kn; + struct kernfs_syscall_ops *scops; + int ret; if (mask & MAY_NOT_BLOCK) return -ECHILD; kn = inode->i_private; + if (!kernfs_get_active(kn)) + return -ENODEV; mutex_lock(&kernfs_mutex); kernfs_refresh_inode(kn, inode); mutex_unlock(&kernfs_mutex); - return generic_permission(inode, mask); + scops = kernfs_root(kn)->syscall_ops; + if (unlikely(scops && scops->permission)) + ret = scops->permission(inode, kn, mask); + else + ret = generic_permission(inode, mask); + + kernfs_put_active(kn); + return ret; } diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 96356ef012de..373b5a888a81 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -16,6 +16,7 @@ #include <linux/rbtree.h> #include <linux/atomic.h> #include <linux/wait.h> +#include <linux/fs.h> struct file; struct dentry; @@ -154,6 +155,8 @@ struct kernfs_syscall_ops { const char *new_name); int (*show_path)(struct seq_file *sf, struct kernfs_node *kn, struct kernfs_root *root); + int (*permission)(struct inode *inode, struct kernfs_node *kn, + int mask); }; struct kernfs_root { -- 2.9.0 ^ permalink raw reply related [flat|nested] 34+ messages in thread
[parent not found: <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org>]
* [PATCH v1 2/3] cgroup: allow for unprivileged subtree management [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org> @ 2016-07-18 16:18 ` Aleksa Sarai [not found] ` <20160718161816.13040-3-asarai-l3A5Bk7waGM@public.gmane.org> 2016-07-18 16:18 ` [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants Aleksa Sarai 1 sibling, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-18 16:18 UTC (permalink / raw) To: Greg Kroah-Hartman, Tejun Heo, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, Aleksa Sarai, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Use the new custom ->permission hook to allow unprivileged processes to mkdir new sub-cgroup directories of the root_cset of their current cgroup namespace. No process outside of the cgroup namespace (or in a sub-namespace) has this ability, and thus a process must have sufficient privileges to setns to a cgroup namespace in order to create cgroups in a cgroup they are not currently residing in. Only privileged processes in the user namespace pinned to the cgroup namespace have this new ability. This further restricts any oddness from happening with the creation of many cgroups which the process cannot effectively join. This change only applies to the default hierarchy, as cgroupv1 cgroups are not necessarily hierarchical (thus allowing the creating of new sub-cgroups would allow for circumvention of cgroup limits). However, since cgroupv2 cgroups are strictly hierarchical as a design constraint this is possible. It should be noted that cgroupv2 also has attaching restrictions that make this process safe against two complicit processes from migrating a process to the less restrictive cgroup of the two. Cc: dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ@public.gmane.org Signed-off-by: Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> --- kernel/cgroup.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 8647f3112f5c..4559baa7eabd 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -5490,6 +5490,67 @@ static int cgroup_rmdir(struct kernfs_node *kn) return ret; } +/* + * We have specific rules when deciding if a process can write to a cgroup + * directory, based on their current state inside cgroupns. + */ +static int cgroup_permission(struct inode *inode, struct kernfs_node *kn, + int mask) +{ + int ret; + struct cgroup *cgroup; + struct cgroup_namespace *cgroupns; + + /* + * First, compute the generic_permission return value. In most cases + * this will succeed and we can also avoid duplicating this code. + */ + + cgroup = kn->priv; + cgroup_get(cgroup); + + /* First, try the generic method which should work in most cases. */ + ret = generic_permission(inode, mask); + + /* If the generic check succeeded, then we're all good. */ + if (!ret) + goto out_put_cgroup; + + /* We're only interested in cgroup directories. */ + if (kernfs_type(kn) != KERNFS_DIR) + goto out_put_cgroup; + + /* ... and in may_create() operations only. */ + if ((mask & (MAY_WRITE | MAY_EXEC)) != (MAY_WRITE | MAY_EXEC)) + goto out_put_cgroup; + + /* + * This only applies for cgroups on the default hierarchy, as cgroupv1 + * was not truly hierarchical this operation was not safe. + */ + if (!cgroup_on_dfl(cgroup)) + goto out_put_cgroup; + + cgroupns = current->nsproxy->cgroup_ns; + get_cgroup_ns(cgroupns); + + ret = -EPERM; + if (cgroupns->root_cset->dfl_cgrp == cgroup) { + /* + * Check CAP_SYS_ADMIN, to make sure that unprivileged + * processes inside a cgroup namespace they don't "own" don't + * get any special treatment. + */ + if (ns_capable(cgroupns->user_ns, CAP_SYS_ADMIN)) + ret = 0; + } + + put_cgroup_ns(cgroupns); +out_put_cgroup: + cgroup_put(cgroup); + return ret; +} + static struct kernfs_syscall_ops cgroup_kf_syscall_ops = { .remount_fs = cgroup_remount, .show_options = cgroup_show_options, @@ -5497,6 +5558,7 @@ static struct kernfs_syscall_ops cgroup_kf_syscall_ops = { .rmdir = cgroup_rmdir, .rename = cgroup_rename, .show_path = cgroup_show_path, + .permission = cgroup_permission, }; static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) -- 2.9.0 ^ permalink raw reply related [flat|nested] 34+ messages in thread
[parent not found: <20160718161816.13040-3-asarai-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH v1 2/3] cgroup: allow for unprivileged subtree management [not found] ` <20160718161816.13040-3-asarai-l3A5Bk7waGM@public.gmane.org> @ 2016-07-20 15:45 ` Tejun Heo [not found] ` <20160720154557.GF4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-20 15:45 UTC (permalink / raw) To: Aleksa Sarai Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Hello, Aleksa. On Tue, Jul 19, 2016 at 02:18:15AM +1000, Aleksa Sarai wrote: > +static int cgroup_permission(struct inode *inode, struct kernfs_node *kn, > + int mask) > +{ > + int ret; > + struct cgroup *cgroup; > + struct cgroup_namespace *cgroupns; > + > + /* > + * First, compute the generic_permission return value. In most cases > + * this will succeed and we can also avoid duplicating this code. > + */ > + > + cgroup = kn->priv; > + cgroup_get(cgroup); This pattern which is repated for cgroupns doesn't make sense. The code is already assuming that the cgroup is safe to deref. Getting its reference doesn't do anything. Getting it here would only make sense if the pointer is passed to an asynchronous context. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160720154557.GF4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 2/3] cgroup: allow for unprivileged subtree management [not found] ` <20160720154557.GF4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-07-20 22:59 ` Aleksa Sarai 0 siblings, 0 replies; 34+ messages in thread From: Aleksa Sarai @ 2016-07-20 22:59 UTC (permalink / raw) To: Tejun Heo Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley >> +static int cgroup_permission(struct inode *inode, struct kernfs_node *kn, >> + int mask) >> +{ >> + int ret; >> + struct cgroup *cgroup; >> + struct cgroup_namespace *cgroupns; >> + >> + /* >> + * First, compute the generic_permission return value. In most cases >> + * this will succeed and we can also avoid duplicating this code. >> + */ >> + >> + cgroup = kn->priv; >> + cgroup_get(cgroup); > > This pattern which is repated for cgroupns doesn't make sense. The > code is already assuming that the cgroup is safe to deref. Getting > its reference doesn't do anything. Getting it here would only make > sense if the pointer is passed to an asynchronous context. I'll send out a fixed patchset once we figure out the cgroups_proc_write_permission() stuff. -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org> 2016-07-18 16:18 ` [PATCH v1 2/3] cgroup: allow for unprivileged subtree management Aleksa Sarai @ 2016-07-18 16:18 ` Aleksa Sarai [not found] ` <20160718161816.13040-4-asarai-l3A5Bk7waGM@public.gmane.org> 1 sibling, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-18 16:18 UTC (permalink / raw) To: Greg Kroah-Hartman, Tejun Heo, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, Aleksa Sarai, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ If we're moving from a parent to a direct descendant, the only end result (on cgroupv2 hierarchies) is that the process experiences more restrictive resource limits. Thus, there's no reason to restrict processes from moving to direct descendants based on whether or not they have cgroup.procs write access to their current cgroup. This is important for unprivileged subtree management, as it allows unprivileged processes to move to their newly create subtrees. Cc: dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ@public.gmane.org Signed-off-by: Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> --- kernel/cgroup.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 4559baa7eabd..fa403357ba91 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2859,14 +2859,22 @@ static int cgroup_procs_write_permission(struct task_struct *task, cgrp = task_cgroup_from_root(task, &cgrp_dfl_root); spin_unlock_irq(&css_set_lock); - while (!cgroup_is_descendant(dst_cgrp, cgrp)) - cgrp = cgroup_parent(cgrp); - - ret = -ENOMEM; - inode = kernfs_get_inode(sb, cgrp->procs_file.kn); - if (inode) { - ret = inode_permission(inode, MAY_WRITE); - iput(inode); + /* + * If we are moving to a descendant of our current cgroup, we + * can only further restrict the cgroup limits we must follow. + * Thus, it doesn't make sense to restrict the cgroup.procs + * write. + */ + if (!cgroup_is_descendant(dst_cgrp, cgrp)) { + while (!cgroup_is_descendant(dst_cgrp, cgrp)) + cgrp = cgroup_parent(cgrp); + + ret = -ENOMEM; + inode = kernfs_get_inode(sb, cgrp->procs_file.kn); + if (inode) { + ret = inode_permission(inode, MAY_WRITE); + iput(inode); + } } } -- 2.9.0 ^ permalink raw reply related [flat|nested] 34+ messages in thread
[parent not found: <20160718161816.13040-4-asarai-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160718161816.13040-4-asarai-l3A5Bk7waGM@public.gmane.org> @ 2016-07-20 15:51 ` Tejun Heo [not found] ` <20160720155147.GG4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-20 15:51 UTC (permalink / raw) To: Aleksa Sarai Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Hello, Aleksa. On Tue, Jul 19, 2016 at 02:18:16AM +1000, Aleksa Sarai wrote: > If we're moving from a parent to a direct descendant, the only end > result (on cgroupv2 hierarchies) is that the process experiences more > restrictive resource limits. Thus, there's no reason to restrict > processes from moving to direct descendants based on whether or not they > have cgroup.procs write access to their current cgroup. > > This is important for unprivileged subtree management, as it allows > unprivileged processes to move to their newly create subtrees. I don't think we can do this as this allows a sub-cgroup to steal an ancestor's process whether the ancestor likes it or not. A process being put in a context where it's more restricted without whatever is managing that part of cgroup hierarchy is not ok, at all. Please also note that nobody expects its processes to be stolen underneath it. This would be a management nightmare. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160720155147.GG4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160720155147.GG4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-07-20 22:58 ` Aleksa Sarai [not found] ` <6e975d80-4077-fb8b-ec84-708e37c8e149-l3A5Bk7waGM@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-20 22:58 UTC (permalink / raw) To: Tejun Heo Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley >> If we're moving from a parent to a direct descendant, the only end >> result (on cgroupv2 hierarchies) is that the process experiences more >> restrictive resource limits. Thus, there's no reason to restrict >> processes from moving to direct descendants based on whether or not they >> have cgroup.procs write access to their current cgroup. >> >> This is important for unprivileged subtree management, as it allows >> unprivileged processes to move to their newly create subtrees. > > I don't think we can do this as this allows a sub-cgroup to steal an > ancestor's process whether the ancestor likes it or not. A process > being put in a context where it's more restricted without whatever is > managing that part of cgroup hierarchy is not ok, at all. Please also > note that nobody expects its processes to be stolen underneath it. > This would be a management nightmare. I'm not sure what you mean by "steal". The user doing the migration owns the process, so I would argue that they aren't "stealing" anything. While a higher level process might not know where precisely in the hierarchy the process is, they'll know it that it must be a sub-cgroup of the one they were put in (meaning the parent can still impose restrictions without any issue). If you want, we can make it so that an unprivileged user migrating processes to a child cgroup only works if you're in the same cgroup namespace (and have CAP_SYS_ADMIN in the pinned user namespace, etc). The current setup would obviously still work, but you'd add a permission for users that just want to be able to limit their own processes. IIRC we need to update cgroup_procs_write_permission() anyway. By having the cgroup namespace requirement, you'd definitely have to "own" the process in every sense of the word I can imagine. -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <6e975d80-4077-fb8b-ec84-708e37c8e149-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <6e975d80-4077-fb8b-ec84-708e37c8e149-l3A5Bk7waGM@public.gmane.org> @ 2016-07-20 23:02 ` Tejun Heo [not found] ` <20160720230228.GA19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-20 23:02 UTC (permalink / raw) To: Aleksa Sarai Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley Hello, Aleksa. On Thu, Jul 21, 2016 at 08:58:32AM +1000, Aleksa Sarai wrote: > I'm not sure what you mean by "steal". The user doing the migration owns the In the sense that the ancestor cgroup can be modified by one of its descendants even when that descendant doesn't have enough permission to modify the ancestor. > process, so I would argue that they aren't "stealing" anything. While a > higher level process might not know where precisely in the hierarchy the > process is, they'll know it that it must be a sub-cgroup of the one they > were put in (meaning the parent can still impose restrictions without any > issue). Hmmm... it's not just about the ownership of the process itself. If it had been, we wouldn't have bothered with permission model on cgroup hierarchy itself. It's also about who is allowed to modify a given cgroup and what you're proposing violates that. > If you want, we can make it so that an unprivileged user migrating processes > to a child cgroup only works if you're in the same cgroup namespace (and > have CAP_SYS_ADMIN in the pinned user namespace, etc). The current setup > would obviously still work, but you'd add a permission for users that just > want to be able to limit their own processes. IIRC we need to update > cgroup_procs_write_permission() anyway. By having the cgroup namespace > requirement, you'd definitely have to "own" the process in every sense of > the word I can imagine. Maybe I'm misunderstanding but I can't see how that would change the situation in a significant way. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160720230228.GA19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160720230228.GA19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> @ 2016-07-20 23:18 ` Aleksa Sarai [not found] ` <982fcf3a-3685-9bd7-dd95-7bff255c9421-l3A5Bk7waGM@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-20 23:18 UTC (permalink / raw) To: Tejun Heo Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley >> process, so I would argue that they aren't "stealing" anything. While a >> higher level process might not know where precisely in the hierarchy the >> process is, they'll know it that it must be a sub-cgroup of the one they >> were put in (meaning the parent can still impose restrictions without any >> issue). > > Hmmm... it's not just about the ownership of the process itself. If > it had been, we wouldn't have bothered with permission model on cgroup > hierarchy itself. It's also about who is allowed to modify a given > cgroup and what you're proposing violates that. I feel like the permission model makes sense in certain cases (the common ancestor restriction, as well as the ability for a parent to apply limits to children by setting its own limits). Neither of those are violated (if you read the commit that introduced the common ancestor restriction). Maybe if you give me a usecase of when it might be important that a process must not be able to move to a sub-cgroup of its current one, I might be able to understand your concerns? From my perspective, I think that's actually quite useful. >> If you want, we can make it so that an unprivileged user migrating processes >> to a child cgroup only works if you're in the same cgroup namespace (and >> have CAP_SYS_ADMIN in the pinned user namespace, etc). The current setup >> would obviously still work, but you'd add a permission for users that just >> want to be able to limit their own processes. IIRC we need to update >> cgroup_procs_write_permission() anyway. By having the cgroup namespace >> requirement, you'd definitely have to "own" the process in every sense of >> the word I can imagine. > > Maybe I'm misunderstanding but I can't see how that would change the > situation in a significant way. Well, it would avoid the issue of a process being moved against *its* will. The process would have to be complicit in joining (or unsharing) a cgroup namespace. I'm not sure I really agree with the argument that a higher level process should be able to stop a process from imposing more *stringent* limits on itself if the process is complicit in setting those limits (see above). The reason I'm doing this is so that we might be able to _practically_ use cgroups as an unprivileged user (something that will almost certainly be useful to not just the container crowd, but people also planning on using cgroups as advanced forms of rlimits). -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <982fcf3a-3685-9bd7-dd95-7bff255c9421-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <982fcf3a-3685-9bd7-dd95-7bff255c9421-l3A5Bk7waGM@public.gmane.org> @ 2016-07-20 23:19 ` Tejun Heo [not found] ` <20160720231949.GB19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-20 23:19 UTC (permalink / raw) To: Aleksa Sarai Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley Hello, Aleksa. On Thu, Jul 21, 2016 at 09:18:59AM +1000, Aleksa Sarai wrote: > I feel like the permission model makes sense in certain cases (the common > ancestor restriction, as well as the ability for a parent to apply limits to > children by setting its own limits). Neither of those are violated (if you > read the commit that introduced the common ancestor restriction). > > Maybe if you give me a usecase of when it might be important that a process > must not be able to move to a sub-cgroup of its current one, I might be able > to understand your concerns? From my perspective, I think that's actually > quite useful. cgroup is used to keep track of which processes belong where and allowing processes to be moved out of its cgroup like this would be surprising to say the least. > > Maybe I'm misunderstanding but I can't see how that would change the > > situation in a significant way. > > Well, it would avoid the issue of a process being moved against *its* will. > The process would have to be complicit in joining (or unsharing) a cgroup > namespace. I'm not sure I really agree with the argument that a higher level > process should be able to stop a process from imposing more *stringent* > limits on itself if the process is complicit in setting those limits (see > above). > > The reason I'm doing this is so that we might be able to _practically_ use > cgroups as an unprivileged user (something that will almost certainly be > useful to not just the container crowd, but people also planning on using > cgroups as advanced forms of rlimits). I don't get why we need this fragile dance with permissions at all when the same functionality can be achieved by delegating explicitly. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160720231949.GB19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160720231949.GB19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> @ 2016-07-21 7:49 ` Aleksa Sarai [not found] ` <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-21 7:49 UTC (permalink / raw) To: Tejun Heo Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley >> I feel like the permission model makes sense in certain cases (the common >> ancestor restriction, as well as the ability for a parent to apply limits to >> children by setting its own limits). Neither of those are violated (if you >> read the commit that introduced the common ancestor restriction). >> >> Maybe if you give me a usecase of when it might be important that a process >> must not be able to move to a sub-cgroup of its current one, I might be able >> to understand your concerns? From my perspective, I think that's actually >> quite useful. > > cgroup is used to keep track of which processes belong where and > allowing processes to be moved out of its cgroup like this would be > surprising to say the least. Would you find it acceptable if we added a bit that would make this not happen (you could specify that a cgroup should not allow a process to move itself to a sub-cgroup)? Or an aggregate cgroup.procs that gives you all of the processes in the entire branch of the tree? Surely this is something that can be fixed without unnecessarily restricting users from doing useful things. >> The reason I'm doing this is so that we might be able to _practically_ use >> cgroups as an unprivileged user (something that will almost certainly be >> useful to not just the container crowd, but people also planning on using >> cgroups as advanced forms of rlimits). > > I don't get why we need this fragile dance with permissions at all > when the same functionality can be achieved by delegating explicitly. The key words being "unprivileged user". Currently, if I am a regular user on a system and I want to use the freezer cgroup to pause a process I am running, I have to *go to the administrator and ask them to give me permission to do that*. Why is that necessary? I find it quite troubling that the usecase of an ordinary user on a system trying to use something as useful as cgroups is considered to be "solved" by asking your administrator (or systemd) to do it for you. "Delegating explicitly" is punting on the problem, by saying "just get the administrator to do the setup for you". What if you don't have the opportunity to do that, and it takes you 4 weeks of sending emails for you to get the administrator to do _anything_? This is something I'm trying to fix with my recent work with rootless containers (and quite a few other people are trying to fix it too). Currently we just simply can't do certain operations as an unprivileged user that would be possible *if we could just use cgroups*. Things like the freezer cgroup would be invaluable for containers, and I guarantee that the Chromium and Firefox folks would find it useful to be able to limit browser processes in a similar way. -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org> @ 2016-07-21 14:33 ` Serge E. Hallyn [not found] ` <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-07-21 14:52 ` Tejun Heo 1 sibling, 1 reply; 34+ messages in thread From: Serge E. Hallyn @ 2016-07-21 14:33 UTC (permalink / raw) To: Aleksa Sarai Cc: Tejun Heo, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley Quoting Aleksa Sarai (asarai-l3A5Bk7waGM@public.gmane.org): > >>I feel like the permission model makes sense in certain cases (the common > >>ancestor restriction, as well as the ability for a parent to apply limits to > >>children by setting its own limits). Neither of those are violated (if you > >>read the commit that introduced the common ancestor restriction). > >> > >>Maybe if you give me a usecase of when it might be important that a process > >>must not be able to move to a sub-cgroup of its current one, I might be able > >>to understand your concerns? From my perspective, I think that's actually > >>quite useful. > > > >cgroup is used to keep track of which processes belong where and > >allowing processes to be moved out of its cgroup like this would be > >surprising to say the least. > > Would you find it acceptable if we added a bit that would make this > not happen (you could specify that a cgroup should not allow a > process to move itself to a sub-cgroup)? Or an aggregate > cgroup.procs that gives you all of the processes in the entire > branch of the tree? Surely this is something that can be fixed > without unnecessarily restricting users from doing useful things. > > >>The reason I'm doing this is so that we might be able to _practically_ use > >>cgroups as an unprivileged user (something that will almost certainly be > >>useful to not just the container crowd, but people also planning on using > >>cgroups as advanced forms of rlimits). > > > >I don't get why we need this fragile dance with permissions at all > >when the same functionality can be achieved by delegating explicitly. > > The key words being "unprivileged user". Currently, if I am a > regular user on a system and I want to use the freezer cgroup to > pause a process I am running, I have to *go to the administrator and > ask them to give me permission to do that*. Why is that necessary? I Ths is of course solvable using something like libpam-cgfs or libpam-cgm (and others). Since this sounds like a question of policy, not mechanism, userspace seems like the right place. Is there a downside to that (or, as Tejun put it, "delegating explicitly")? ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2016-07-21 14:37 ` Aleksa Sarai [not found] ` <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org> 2016-07-21 14:51 ` James Bottomley 1 sibling, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-21 14:37 UTC (permalink / raw) To: Serge E. Hallyn Cc: Tejun Heo, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley >>>> I feel like the permission model makes sense in certain cases (the common >>>> ancestor restriction, as well as the ability for a parent to apply limits to >>>> children by setting its own limits). Neither of those are violated (if you >>>> read the commit that introduced the common ancestor restriction). >>>> >>>> Maybe if you give me a usecase of when it might be important that a process >>>> must not be able to move to a sub-cgroup of its current one, I might be able >>>> to understand your concerns? From my perspective, I think that's actually >>>> quite useful. >>> >>> cgroup is used to keep track of which processes belong where and >>> allowing processes to be moved out of its cgroup like this would be >>> surprising to say the least. >> >> Would you find it acceptable if we added a bit that would make this >> not happen (you could specify that a cgroup should not allow a >> process to move itself to a sub-cgroup)? Or an aggregate >> cgroup.procs that gives you all of the processes in the entire >> branch of the tree? Surely this is something that can be fixed >> without unnecessarily restricting users from doing useful things. >> >>>> The reason I'm doing this is so that we might be able to _practically_ use >>>> cgroups as an unprivileged user (something that will almost certainly be >>>> useful to not just the container crowd, but people also planning on using >>>> cgroups as advanced forms of rlimits). >>> >>> I don't get why we need this fragile dance with permissions at all >>> when the same functionality can be achieved by delegating explicitly. >> >> The key words being "unprivileged user". Currently, if I am a >> regular user on a system and I want to use the freezer cgroup to >> pause a process I am running, I have to *go to the administrator and >> ask them to give me permission to do that*. Why is that necessary? I > > Ths is of course solvable using something like libpam-cgfs or > libpam-cgm (and others). Since this sounds like a question of > policy, not mechanism, userspace seems like the right place. Is > there a downside to that (or, as Tejun put it, "delegating explicitly")? Having a PAM module requires getting an administrator to install the PAM module (and also presumably audit it, not to mention convincing them that your requirement to use containers are significant enough for them to do any work). It's the same problem IMO. I understand that LXC allows you to do this, but it requires that you get an administrator to *install* and support LXC (as well as the shadow-utils setuid binaries too). There are cases where you don't have the freedom to do that, and also "just get someone to give you privileges temporarily" is again punting on the problem. -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org> @ 2016-07-21 15:01 ` Tejun Heo 2016-07-21 15:09 ` Serge E. Hallyn 1 sibling, 0 replies; 34+ messages in thread From: Tejun Heo @ 2016-07-21 15:01 UTC (permalink / raw) To: Aleksa Sarai Cc: Serge E. Hallyn, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley Hello, Aleksa. On Fri, Jul 22, 2016 at 12:37:42AM +1000, Aleksa Sarai wrote: > > Ths is of course solvable using something like libpam-cgfs or > > libpam-cgm (and others). Since this sounds like a question of > > policy, not mechanism, userspace seems like the right place. Is > > there a downside to that (or, as Tejun put it, "delegating explicitly")? > > Having a PAM module requires getting an administrator to install the PAM > module (and also presumably audit it, not to mention convincing them that > your requirement to use containers are significant enough for them to do any > work). It's the same problem IMO. I understand that LXC allows you to do > this, but it requires that you get an administrator to *install* and support > LXC (as well as the shadow-utils setuid binaries too). There are cases where > you don't have the freedom to do that, and also "just get someone to give > you privileges temporarily" is again punting on the problem. The administrator has to install a new kernel to get this feature from kernel side too. I don't think "to bypass admin" is a strong argument for a new kernel feature especially when it's likely to cause subtle issues as in this case. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org> 2016-07-21 15:01 ` Tejun Heo @ 2016-07-21 15:09 ` Serge E. Hallyn 1 sibling, 0 replies; 34+ messages in thread From: Serge E. Hallyn @ 2016-07-21 15:09 UTC (permalink / raw) To: Aleksa Sarai Cc: Serge E. Hallyn, Tejun Heo, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley Quoting Aleksa Sarai (asarai-l3A5Bk7waGM@public.gmane.org): > >>>>I feel like the permission model makes sense in certain cases (the common > >>>>ancestor restriction, as well as the ability for a parent to apply limits to > >>>>children by setting its own limits). Neither of those are violated (if you > >>>>read the commit that introduced the common ancestor restriction). > >>>> > >>>>Maybe if you give me a usecase of when it might be important that a process > >>>>must not be able to move to a sub-cgroup of its current one, I might be able > >>>>to understand your concerns? From my perspective, I think that's actually > >>>>quite useful. > >>> > >>>cgroup is used to keep track of which processes belong where and > >>>allowing processes to be moved out of its cgroup like this would be > >>>surprising to say the least. > >> > >>Would you find it acceptable if we added a bit that would make this > >>not happen (you could specify that a cgroup should not allow a > >>process to move itself to a sub-cgroup)? Or an aggregate > >>cgroup.procs that gives you all of the processes in the entire > >>branch of the tree? Surely this is something that can be fixed > >>without unnecessarily restricting users from doing useful things. > >> > >>>>The reason I'm doing this is so that we might be able to _practically_ use > >>>>cgroups as an unprivileged user (something that will almost certainly be > >>>>useful to not just the container crowd, but people also planning on using > >>>>cgroups as advanced forms of rlimits). > >>> > >>>I don't get why we need this fragile dance with permissions at all > >>>when the same functionality can be achieved by delegating explicitly. > >> > >>The key words being "unprivileged user". Currently, if I am a > >>regular user on a system and I want to use the freezer cgroup to > >>pause a process I am running, I have to *go to the administrator and > >>ask them to give me permission to do that*. Why is that necessary? I > > > >Ths is of course solvable using something like libpam-cgfs or > >libpam-cgm (and others). Since this sounds like a question of > >policy, not mechanism, userspace seems like the right place. Is > >there a downside to that (or, as Tejun put it, "delegating explicitly")? > > Having a PAM module requires getting an administrator to install the > PAM module (and also presumably audit it, not to mention convincing > them that your requirement to use containers are significant enough Right, but that's also the upside. Just like user namespaces, it *is* possible that there remain exploitable situations when cgroups are delegated, and it is up to the admin, not the user, to gauge how averse they are to that risk. (Fwiw obviously I am very sympathetic to your goals :) > for them to do any work). It's the same problem IMO. I understand > that LXC allows you to do this, but it requires that you get an > administrator to *install* and support LXC (as well as the > shadow-utils setuid binaries too). There are cases where you don't > have the freedom to do that, and also "just get someone to give you > privileges temporarily" is again punting on the problem. > > -- > Aleksa Sarai > Software Engineer (Containers) > SUSE Linux GmbH > https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-07-21 14:37 ` Aleksa Sarai @ 2016-07-21 14:51 ` James Bottomley [not found] ` <1469112709.2331.11.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 1 sibling, 1 reply; 34+ messages in thread From: James Bottomley @ 2016-07-21 14:51 UTC (permalink / raw) To: Serge E. Hallyn, Aleksa Sarai Cc: Tejun Heo, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ On Thu, 2016-07-21 at 09:33 -0500, Serge E. Hallyn wrote: > Quoting Aleksa Sarai (asarai-l3A5Bk7waGM@public.gmane.org): > > > > The reason I'm doing this is so that we might be able to > > > > _practically_ use cgroups as an unprivileged user (something > > > > that will almost certainly be useful to not just the container > > > > crowd, but people also planning on using cgroups as advanced > > > > forms of rlimits). > > > > > > I don't get why we need this fragile dance with permissions at > > > all when the same functionality can be achieved by delegating > > > explicitly. > > > > The key words being "unprivileged user". Currently, if I am a > > regular user on a system and I want to use the freezer cgroup to > > pause a process I am running, I have to *go to the administrator > > and ask them to give me permission to do that*. Why is that > > necessary? > > Ths is of course solvable using something like libpam-cgfs or > libpam-cgm (and others). Since this sounds like a question of > policy, not mechanism, userspace seems like the right place. Is > there a downside to that (or, as Tejun put it, "delegating > explicitly")? Unprivileged containers should "just work" by default as much as possible. There are cases where they can't and policy input is required, like the userns mapping to additional ids beyond the current one, we can still set up the default case without intervention (a single mapping root to current id). What I haven't really heard yet in the debate is the policy reason why an unprivileged user shouldn't set up their own cgroups as children of the current ones (inheriting the constraints). I have heard * it would give power to move other tasks to more rigid constraints. To which the answer is only to allow movememnt of tasks in the current cgroupns * It violates the permissions delegation model. This one doesn't really make too much sense to me: in the same way the userns is root in its own domain, cgroups ns is effective root for the restricted cgroups (and only for processes within its ns). Perhaps the question should be asked the other way around: if we were explicitly delegating permission to every user in the system to set up their own sub cgroups, how would you advise it be done? James ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <1469112709.2331.11.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <1469112709.2331.11.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> @ 2016-07-21 14:59 ` Tejun Heo [not found] ` <20160721145905.GC22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-21 14:59 UTC (permalink / raw) To: James Bottomley Cc: Serge E. Hallyn, Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Hello, James. On Thu, Jul 21, 2016 at 07:51:49AM -0700, James Bottomley wrote: > What I haven't really heard yet in the debate is the policy reason why > an unprivileged user shouldn't set up their own cgroups as children of > the current ones (inheriting the constraints). It's not even about policies. The interface just can't support operations like this in a robust way. We can try to hack enough holes at it to make some scenarios work but it's all but guaranteed that such approach is gonna cause painful long-term issues. > I have heard > > * it would give power to move other tasks to more rigid constraints. > To which the answer is only to allow movememnt of tasks in the > current cgroupns > * It violates the permissions delegation model. This one doesn't > really make too much sense to me: in the same way the userns is root > in its own domain, cgroups ns is effective root for the restricted > cgroups (and only for processes within its ns). > > Perhaps the question should be asked the other way around: if we were > explicitly delegating permission to every user in the system to set up > their own sub cgroups, how would you advise it be done? Coordinate in userspace. Request whatever is managing the cgroup hierarchy to set up delegation. It's not like permission model is fully contained in kernel on modern systems anyway. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160721145905.GC22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160721145905.GC22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-07-21 15:07 ` Aleksa Sarai 2016-07-21 15:04 ` Tejun Heo 0 siblings, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-21 15:07 UTC (permalink / raw) To: Tejun Heo, James Bottomley Cc: Serge E. Hallyn, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ >> I have heard >> >> * it would give power to move other tasks to more rigid constraints. >> To which the answer is only to allow movememnt of tasks in the >> current cgroupns >> * It violates the permissions delegation model. This one doesn't >> really make too much sense to me: in the same way the userns is root >> in its own domain, cgroups ns is effective root for the restricted >> cgroups (and only for processes within its ns). >> >> Perhaps the question should be asked the other way around: if we were >> explicitly delegating permission to every user in the system to set up >> their own sub cgroups, how would you advise it be done? > > Coordinate in userspace. Request whatever is managing the cgroup > hierarchy to set up delegation. It's not like permission model is > fully contained in kernel on modern systems anyway. My experience with certain systemdaemons' cgroup handling doesn't inspire confidence :/ (from the runC side, we've had nothing but issues). Also, how do you even boot into a cgroupv2 system with systemd (I started backporting patches to openSUSE, but it's still not booting)? -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants 2016-07-21 15:07 ` Aleksa Sarai @ 2016-07-21 15:04 ` Tejun Heo 0 siblings, 0 replies; 34+ messages in thread From: Tejun Heo @ 2016-07-21 15:04 UTC (permalink / raw) To: Aleksa Sarai Cc: James Bottomley, Serge E. Hallyn, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel, cgroups, Christian Brauner, dev Hello, Aleksa. On Fri, Jul 22, 2016 at 01:07:13AM +1000, Aleksa Sarai wrote: > > Coordinate in userspace. Request whatever is managing the cgroup > > hierarchy to set up delegation. It's not like permission model is > > fully contained in kernel on modern systems anyway. > > My experience with certain systemdaemons' cgroup handling doesn't inspire > confidence :/ (from the runC side, we've had nothing but issues). Also, how Fix it then. Working around bugs in userland isn't a justifiable rationale for adding new kernel features. > do you even boot into a cgroupv2 system with systemd (I started backporting > patches to openSUSE, but it's still not booting)? With devel branch, just passing in the unified hierarchy boot param seems to work fine here. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org> 2016-07-21 14:33 ` Serge E. Hallyn @ 2016-07-21 14:52 ` Tejun Heo [not found] ` <20160721145242.GB22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 1 sibling, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-21 14:52 UTC (permalink / raw) To: Aleksa Sarai Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley Hello, Aleksa. On Thu, Jul 21, 2016 at 05:49:36PM +1000, Aleksa Sarai wrote: > > > The reason I'm doing this is so that we might be able to _practically_ use > > > cgroups as an unprivileged user (something that will almost certainly be > > > useful to not just the container crowd, but people also planning on using > > > cgroups as advanced forms of rlimits). > > > > I don't get why we need this fragile dance with permissions at all > > when the same functionality can be achieved by delegating explicitly. > > The key words being "unprivileged user". Currently, if I am a regular user > on a system and I want to use the freezer cgroup to pause a process I am > running, I have to *go to the administrator and ask them to give me > permission to do that*. Why is that necessary? I find it quite troubling > that the usecase of an ordinary user on a system trying to use something as > useful as cgroups is considered to be "solved" by asking your administrator > (or systemd) to do it for you. "Delegating explicitly" is punting on the > problem, by saying "just get the administrator to do the setup for you". > What if you don't have the opportunity to do that, and it takes you 4 weeks > of sending emails for you to get the administrator to do _anything_? > > This is something I'm trying to fix with my recent work with rootless > containers (and quite a few other people are trying to fix it too). > Currently we just simply can't do certain operations as an unprivileged user > that would be possible *if we could just use cgroups*. Things like the > freezer cgroup would be invaluable for containers, and I guarantee that the > Chromium and Firefox folks would find it useful to be able to limit browser > processes in a similar way. I understand what you're trying to achieve but don't think cgroup's filesystem interface can accomodate that. To support that level of automatic delegation, the API should be providing enough isolation so that operations in one domain (user-specific operations) are transparent from the other (system-wide administration), which simply isn't true for cgroupfs. As a simple example, imagine a process being moved to another cgroup racing against the special operations you're describing ahead. Both sides are multi-step operations and there are no ways of synchronizing against each other from kernel side and the outcomes can easily be non-sensical. It is unfortunate but we started with and are bound to carry the current vfs based interface which was never designed to support the use cases you're describing in a seamless way and that's why cgroup supports explicit delegation so that userland can take over the necessary coordination and implement more complex operations atop. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160721145242.GB22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160721145242.GB22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-07-21 15:04 ` James Bottomley [not found] ` <1469113456.2331.16.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: James Bottomley @ 2016-07-21 15:04 UTC (permalink / raw) To: Tejun Heo, Aleksa Sarai Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ On Thu, 2016-07-21 at 10:52 -0400, Tejun Heo wrote: > Hello, Aleksa. > > On Thu, Jul 21, 2016 at 05:49:36PM +1000, Aleksa Sarai wrote: > > > > The reason I'm doing this is so that we might be able to > > > > _practically_ use cgroups as an unprivileged user (something > > > > that will almost certainly be useful to not just the container > > > > crowd, but people also planning on using cgroups as advanced > > > > forms of rlimits). > > > > > > I don't get why we need this fragile dance with permissions at > > > all when the same functionality can be achieved by delegating > > > explicitly. > > > > The key words being "unprivileged user". Currently, if I am a > > regular user on a system and I want to use the freezer cgroup to > > pause a process I am running, I have to *go to the administrator > > and ask them to give me permission to do that*. Why is that > > necessary? I find it quite troubling that the usecase of an > > ordinary user on a system trying to use something as useful as > > cgroups is considered to be "solved" by asking your administrator > > (or systemd) to do it for you. "Delegating explicitly" is punting > > on the problem, by saying "just get the administrator to do the > > setup for you". What if you don't have the opportunity to do that, > > and it takes you 4 weeks of sending emails for you to get the > > administrator to do _anything_? > > > > This is something I'm trying to fix with my recent work with > > rootless containers (and quite a few other people are trying to fix > > it too). Currently we just simply can't do certain operations as an > > unprivileged user that would be possible *if we could just use > > cgroups*. Things like the freezer cgroup would be invaluable for > > containers, and I guarantee that the Chromium and Firefox folks > > would find it useful to be able to limit browser processes in a > > similar way. > > I understand what you're trying to achieve but don't think cgroup's > filesystem interface can accomodate that. To support that level of > automatic delegation, the API should be providing enough isolation so > that operations in one domain (user-specific operations) are > transparent from the other (system-wide administration), which simply > isn't true for cgroupfs. As a simple example, imagine a process > being moved to another cgroup racing against the special operations > you're describing ahead. Both sides are multi-step operations and > there are no ways of synchronizing against each other from kernel > side and the outcomes can easily be non-sensical. So if I understand, it's not about actually moving the tasks: echoing the pid to the tasks file is atomic and we can mediate races there. It's about the debris left behind if the admin (or someone with delegated authority) moves the task to a wholly different cgroup. Now we have a cgroup directory in the old cgroup, which the current task has been removed from, for which the current user has permissions and could then move the task back to. Is that the essence of the problem? James ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <1469113456.2331.16.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <1469113456.2331.16.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> @ 2016-07-21 15:07 ` Tejun Heo [not found] ` <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-21 15:07 UTC (permalink / raw) To: James Bottomley Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Hello, James. On Thu, Jul 21, 2016 at 08:04:16AM -0700, James Bottomley wrote: > > I understand what you're trying to achieve but don't think cgroup's > > filesystem interface can accomodate that. To support that level of > > automatic delegation, the API should be providing enough isolation so > > that operations in one domain (user-specific operations) are > > transparent from the other (system-wide administration), which simply > > isn't true for cgroupfs. As a simple example, imagine a process > > being moved to another cgroup racing against the special operations > > you're describing ahead. Both sides are multi-step operations and > > there are no ways of synchronizing against each other from kernel > > side and the outcomes can easily be non-sensical. > > So if I understand, it's not about actually moving the tasks: echoing > the pid to the tasks file is atomic and we can mediate races there. Yeah, each operation is atomic but most meaningul operations are multi-step. > It's about the debris left behind if the admin (or someone with > delegated authority) moves the task to a wholly different cgroup. > > Now we have a cgroup directory in the old cgroup, which the current > task has been removed from, for which the current user has permissions > and could then move the task back to. Is that the essence of the > problem? That'd be one side. The other side is the one moving. Let's say the system admin thing wants to move all processe from A proper to B. It would do that by draining processes from A's procs file into B's and even that is multistep and can race. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-07-21 15:16 ` James Bottomley 2016-07-21 15:26 ` Tejun Heo 2016-07-22 8:24 ` Aleksa Sarai 1 sibling, 1 reply; 34+ messages in thread From: James Bottomley @ 2016-07-21 15:16 UTC (permalink / raw) To: Tejun Heo Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ On Thu, 2016-07-21 at 11:07 -0400, Tejun Heo wrote: > Hello, James. > > On Thu, Jul 21, 2016 at 08:04:16AM -0700, James Bottomley wrote: > > > I understand what you're trying to achieve but don't think > > > cgroup's filesystem interface can accomodate that. To support > > > that level of automatic delegation, the API should be providing > > > enough isolation so that operations in one domain (user-specific > > > operations) are transparent from the other (system-wide > > > administration), which simply isn't true for cgroupfs. As a > > > simple example, imagine a process being moved to another cgroup > > > racing against the special operations you're describing ahead. > > > Both sides are multi-step operations and there are no ways of > > > synchronizing against each other from kernel side and the > > > outcomes can easily be non-sensical. > > > > So if I understand, it's not about actually moving the tasks: > > echoing the pid to the tasks file is atomic and we can mediate > > races there. > > Yeah, each operation is atomic but most meaningul operations are > multi-step. > > > It's about the debris left behind if the admin (or someone with > > delegated authority) moves the task to a wholly different cgroup. > > > > Now we have a cgroup directory in the old cgroup, which the current > > task has been removed from, for which the current user has > > permissions and could then move the task back to. Is that the > > essence of the problem? > > That'd be one side. The other side is the one moving. Let's say the > system admin thing wants to move all processe from A proper to B. It > would do that by draining processes from A's procs file into B's and > even that is multistep and can race. So the second part is that once we allow the creation of subdirectories, there's no unified tasks file, so there's no way of draining A proper without enumerating and descending into the cgroupns created subtrees in A? James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants 2016-07-21 15:16 ` James Bottomley @ 2016-07-21 15:26 ` Tejun Heo [not found] ` <20160721152648.GA23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-21 15:26 UTC (permalink / raw) To: James Bottomley Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel, cgroups, Christian Brauner, dev Hello, James. On Thu, Jul 21, 2016 at 08:16:34AM -0700, James Bottomley wrote: > > That'd be one side. The other side is the one moving. Let's say the > > system admin thing wants to move all processe from A proper to B. It > > would do that by draining processes from A's procs file into B's and > > even that is multistep and can race. > > So the second part is that once we allow the creation of > subdirectories, there's no unified tasks file, so there's no way of > draining A proper without enumerating and descending into the cgroupns > created subtrees in A? Not that. If it races, it will end up moving processes which are no longer in A proper. Such operations or distinctions might not be meaningful under many circumstances but it'd be a pretty big hole in the API to create and I can't really declare these holes are gonna be okay with confidence. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160721152648.GA23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160721152648.GA23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-07-21 15:34 ` James Bottomley [not found] ` <1469115276.2331.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: James Bottomley @ 2016-07-21 15:34 UTC (permalink / raw) To: Tejun Heo Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ On Thu, 2016-07-21 at 11:26 -0400, Tejun Heo wrote: > Hello, James. > > On Thu, Jul 21, 2016 at 08:16:34AM -0700, James Bottomley wrote: > > > That'd be one side. The other side is the one moving. Let's say > > > the system admin thing wants to move all processe from A proper > > > to B. It would do that by draining processes from A's procs > > > file into B's and even that is multistep and can race. > > > > So the second part is that once we allow the creation of > > subdirectories, there's no unified tasks file, so there's no way of > > draining A proper without enumerating and descending into the > > cgroupns created subtrees in A? > > Not that. If it races, it will end up moving processes which are no > longer in A proper. So if I as the cgroup ns owner am moving a task from A to A_subdir, the admin scanning tasks in all of A may miss this task in motion because all the tasks files can't be scanned atomically? James ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <1469115276.2331.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <1469115276.2331.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> @ 2016-07-21 15:50 ` Tejun Heo [not found] ` <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-22 8:30 ` Aleksa Sarai 0 siblings, 2 replies; 34+ messages in thread From: Tejun Heo @ 2016-07-21 15:50 UTC (permalink / raw) To: James Bottomley Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Hello, James. On Thu, Jul 21, 2016 at 08:34:36AM -0700, James Bottomley wrote: > So if I as the cgroup ns owner am moving a task from A to A_subdir, the > admin scanning tasks in all of A may miss this task in motion because > all the tasks files can't be scanned atomically? So, the admin just wants to move processes from A and only A to B. It doesn't wanna interfere with processes in the subdirs or on-going ns operations, but if the race occurs, both A -> B migration and ns subdir operation would succeed and the end result would be something neither expects. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-07-21 18:16 ` James Bottomley [not found] ` <1469125002.2331.54.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: James Bottomley @ 2016-07-21 18:16 UTC (permalink / raw) To: Tejun Heo Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ On Thu, 2016-07-21 at 11:50 -0400, Tejun Heo wrote: > Hello, James. > > On Thu, Jul 21, 2016 at 08:34:36AM -0700, James Bottomley wrote: > > So if I as the cgroup ns owner am moving a task from A to A_subdir, > > the admin scanning tasks in all of A may miss this task in motion > > because all the tasks files can't be scanned atomically? > > So, the admin just wants to move processes from A and only A to B. > It doesn't wanna interfere with processes in the subdirs or on-going > ns operations, but if the race occurs, both A -> B migration and ns > subdir operation would succeed and the end result would be something > neither expects. OK so a theoretical (not saying it's implementable, we'll have to explore that) way of fixing all of this is to have separate views of the tree. If the admin always saw everything in A, even if the cgroupns had created subdirectories in its own namespace. That way there'd be no race ever in the admin's view (because it's the view they created and would expect to see). All sub cgroup activity would only be visible to tasks in the new cgroupns (we'd probably have to have them make this visible by mounting a new cgroup tree). James ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <1469125002.2331.54.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <1469125002.2331.54.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> @ 2016-07-21 21:06 ` Tejun Heo 0 siblings, 0 replies; 34+ messages in thread From: Tejun Heo @ 2016-07-21 21:06 UTC (permalink / raw) To: James Bottomley Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Hello, On Thu, Jul 21, 2016 at 11:16:42AM -0700, James Bottomley wrote: > OK so a theoretical (not saying it's implementable, we'll have to > explore that) way of fixing all of this is to have separate views of > the tree. If the admin always saw everything in A, even if the > cgroupns had created subdirectories in its own namespace. That way > there'd be no race ever in the admin's view (because it's the view they > created and would expect to see). All sub cgroup activity would only > be visible to tasks in the new cgroupns (we'd probably have to have > them make this visible by mounting a new cgroup tree). Yeah, something like that. The two domains of operation need to be transparent to each other so that things taking place at system level doesn't interfere with user level operations and vice-versa. It's likely that implementing something like that within filesystem based interface won't work out too well. There are too many expected behaviors from being a filesystem which don't quite agree with such abstraction. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants 2016-07-21 15:50 ` Tejun Heo [not found] ` <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> @ 2016-07-22 8:30 ` Aleksa Sarai [not found] ` <177bbc17-5c75-1ff8-0b1f-0c5601fa7e6b-l3A5Bk7waGM@public.gmane.org> 1 sibling, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-22 8:30 UTC (permalink / raw) To: Tejun Heo, James Bottomley Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel, cgroups, Christian Brauner, dev >> So if I as the cgroup ns owner am moving a task from A to A_subdir, the >> admin scanning tasks in all of A may miss this task in motion because >> all the tasks files can't be scanned atomically? > > So, the admin just wants to move processes from A and only A to B. It > doesn't wanna interfere with processes in the subdirs or on-going ns > operations, but if the race occurs, both A -> B migration and ns > subdir operation would succeed and the end result would be something > neither expects. Just to be clear, the "ns subdir operation" is a cgroup namespaced process moving A -> A_subdir which is racing against some administrative process moving everything from A -> B (but not wanting to move A -> A_subdir)? So should there be policy within the kernel to not permit a process outside a cgroup namespace to move processes inside the namespace? Or would you be concerned about people escaping the administrator's attempts to reorganise the hierarchy? What if we extended rename(2) so that it /does/ allow for reorganisation of the hierarchy? So an administrator could use rename to change the point at which a cgroupns root is rooted at, but not be able to move the actual processes within the cgroup namespace around? The administrator could also join the cgroupns (without needing to join the userns) and then just move things around that way? Do any of those suggestions seem reasonable? -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <177bbc17-5c75-1ff8-0b1f-0c5601fa7e6b-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <177bbc17-5c75-1ff8-0b1f-0c5601fa7e6b-l3A5Bk7waGM@public.gmane.org> @ 2016-07-25 18:38 ` Tejun Heo [not found] ` <20160725183801.GE19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Tejun Heo @ 2016-07-25 18:38 UTC (permalink / raw) To: Aleksa Sarai Cc: James Bottomley, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Hello, Aleksa. On Fri, Jul 22, 2016 at 06:30:07PM +1000, Aleksa Sarai wrote: > Just to be clear, the "ns subdir operation" is a cgroup namespaced process > moving A -> A_subdir which is racing against some administrative process > moving everything from A -> B (but not wanting to move A -> A_subdir)? Yes. > So should there be policy within the kernel to not permit a process outside > a cgroup namespace to move processes inside the namespace? Or would you be > concerned about people escaping the administrator's attempts to reorganise > the hierarchy? Pushed that far, I frankly can't assess what the implications and side-effects would be. > What if we extended rename(2) so that it /does/ allow for reorganisation of > the hierarchy? So an administrator could use rename to change the point at > which a cgroupns root is rooted at, but not be able to move the actual > processes within the cgroup namespace around? The administrator could also > join the cgroupns (without needing to join the userns) and then just move > things around that way? > > Do any of those suggestions seem reasonable? Unfortunately not. I get what you're trying to do and am sure we can make some specific scenarios work with the right set of hacks and holes, but this type of approach is very dangerous in the long term. The downside we have now is that we need an explicit delegation from userland and that stems from the architectural constraints of cgroupfs. It's not ideal but an acceptable situation. Let's please not riddle the whole thing with holes that we don't understand for an inconvenience which can be worked around otherwise. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20160725183801.GE19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160725183801.GE19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> @ 2016-07-25 22:54 ` Serge E. Hallyn 0 siblings, 0 replies; 34+ messages in thread From: Serge E. Hallyn @ 2016-07-25 22:54 UTC (permalink / raw) To: Tejun Heo Cc: Aleksa Sarai, James Bottomley, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org): > Hello, Aleksa. > > On Fri, Jul 22, 2016 at 06:30:07PM +1000, Aleksa Sarai wrote: > > Just to be clear, the "ns subdir operation" is a cgroup namespaced process > > moving A -> A_subdir which is racing against some administrative process > > moving everything from A -> B (but not wanting to move A -> A_subdir)? > > Yes. > > > So should there be policy within the kernel to not permit a process outside > > a cgroup namespace to move processes inside the namespace? Or would you be > > concerned about people escaping the administrator's attempts to reorganise > > the hierarchy? > > Pushed that far, I frankly can't assess what the implications and > side-effects would be. > > > What if we extended rename(2) so that it /does/ allow for reorganisation of > > the hierarchy? So an administrator could use rename to change the point at > > which a cgroupns root is rooted at, but not be able to move the actual > > processes within the cgroup namespace around? The administrator could also > > join the cgroupns (without needing to join the userns) and then just move > > things around that way? > > > > Do any of those suggestions seem reasonable? > > Unfortunately not. I get what you're trying to do and am sure we can > make some specific scenarios work with the right set of hacks and > holes, but this type of approach is very dangerous in the long term. > > The downside we have now is that we need an explicit delegation from > userland and that stems from the architectural constraints of > cgroupfs. It's not ideal but an acceptable situation. Let's please > not riddle the whole thing with holes that we don't understand for an > inconvenience which can be worked around otherwise. It's something which distros, post-machine-install scripts, and post-pkg-install scripts can all very easily setup. They have to setup subuid and subgid too if you want safe unprivileged containers, so I really don't feel this is so arduous. What is the use case where none of these are an option? (I don't doubt that it exists since you are pushing pretty hard) -serge ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants [not found] ` <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-21 15:16 ` James Bottomley @ 2016-07-22 8:24 ` Aleksa Sarai 2016-07-25 18:44 ` Tejun Heo 1 sibling, 1 reply; 34+ messages in thread From: Aleksa Sarai @ 2016-07-22 8:24 UTC (permalink / raw) To: Tejun Heo, James Bottomley Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ >> It's about the debris left behind if the admin (or someone with >> delegated authority) moves the task to a wholly different cgroup. >> >> Now we have a cgroup directory in the old cgroup, which the current >> task has been removed from, for which the current user has permissions >> and could then move the task back to. Is that the essence of the >> problem? > > That'd be one side. The other side is the one moving. Let's say the > system admin thing wants to move all processe from A proper to B. It > would do that by draining processes from A's procs file into B's and > even that is multistep and can race. Once freezer is ported, wouldn't that allow you to stop the processes so you can drain them? I understand your concern with draining, but surely the same races occur if you fork? How many times would you need to scan cgroup.procs to make sure that you didn't miss anything (and if there's enough processes then cgroup.procs reads aren't atomic either). -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants 2016-07-22 8:24 ` Aleksa Sarai @ 2016-07-25 18:44 ` Tejun Heo 0 siblings, 0 replies; 34+ messages in thread From: Tejun Heo @ 2016-07-25 18:44 UTC (permalink / raw) To: Aleksa Sarai Cc: James Bottomley, Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel, cgroups, Christian Brauner, dev On Fri, Jul 22, 2016 at 06:24:25PM +1000, Aleksa Sarai wrote: > > > It's about the debris left behind if the admin (or someone with > > > delegated authority) moves the task to a wholly different cgroup. > > > > > > Now we have a cgroup directory in the old cgroup, which the current > > > task has been removed from, for which the current user has permissions > > > and could then move the task back to. Is that the essence of the > > > problem? > > > > That'd be one side. The other side is the one moving. Let's say the > > system admin thing wants to move all processe from A proper to B. It > > would do that by draining processes from A's procs file into B's and > > even that is multistep and can race. > > Once freezer is ported, wouldn't that allow you to stop the processes so you > can drain them? I understand your concern with draining, but surely the same > races occur if you fork? How many times would you need to scan cgroup.procs > to make sure that you didn't miss anything (and if there's enough processes > then cgroup.procs reads aren't atomic either). Sure, draining has to be iterative and in actual use cases freezing would help but I think you're misunderstanding why I gave the example. It wasn't meant to be "if you solve this problem case, we're all good". It was an illustration of the underlying problems where the basic interface isn't fit for this sort of complex semantics. Please take a step back and look at the larger picture. The current interface is silly in many areas but it's still mostly integral and I'd really like to keep at least that. If there is a way to do this in a way which is not hacky, we can definitely look into it. However, given that the imminent problem is relatively small, I'm not too sure that there would be a solution of justifiable size. Thanks. -- tejun ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2016-07-25 22:54 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-18 16:18 [PATCH v1 0/3] cgroup: allow for unprivileged management Aleksa Sarai 2016-07-18 16:18 ` [PATCH v1 1/3] kernfs: add support for custom per-sb permission hooks Aleksa Sarai [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org> 2016-07-18 16:18 ` [PATCH v1 2/3] cgroup: allow for unprivileged subtree management Aleksa Sarai [not found] ` <20160718161816.13040-3-asarai-l3A5Bk7waGM@public.gmane.org> 2016-07-20 15:45 ` Tejun Heo [not found] ` <20160720154557.GF4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-20 22:59 ` Aleksa Sarai 2016-07-18 16:18 ` [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants Aleksa Sarai [not found] ` <20160718161816.13040-4-asarai-l3A5Bk7waGM@public.gmane.org> 2016-07-20 15:51 ` Tejun Heo [not found] ` <20160720155147.GG4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-20 22:58 ` Aleksa Sarai [not found] ` <6e975d80-4077-fb8b-ec84-708e37c8e149-l3A5Bk7waGM@public.gmane.org> 2016-07-20 23:02 ` Tejun Heo [not found] ` <20160720230228.GA19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> 2016-07-20 23:18 ` Aleksa Sarai [not found] ` <982fcf3a-3685-9bd7-dd95-7bff255c9421-l3A5Bk7waGM@public.gmane.org> 2016-07-20 23:19 ` Tejun Heo [not found] ` <20160720231949.GB19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> 2016-07-21 7:49 ` Aleksa Sarai [not found] ` <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org> 2016-07-21 14:33 ` Serge E. Hallyn [not found] ` <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2016-07-21 14:37 ` Aleksa Sarai [not found] ` <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org> 2016-07-21 15:01 ` Tejun Heo 2016-07-21 15:09 ` Serge E. Hallyn 2016-07-21 14:51 ` James Bottomley [not found] ` <1469112709.2331.11.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2016-07-21 14:59 ` Tejun Heo [not found] ` <20160721145905.GC22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-21 15:07 ` Aleksa Sarai 2016-07-21 15:04 ` Tejun Heo 2016-07-21 14:52 ` Tejun Heo [not found] ` <20160721145242.GB22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-21 15:04 ` James Bottomley [not found] ` <1469113456.2331.16.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2016-07-21 15:07 ` Tejun Heo [not found] ` <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-21 15:16 ` James Bottomley 2016-07-21 15:26 ` Tejun Heo [not found] ` <20160721152648.GA23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-21 15:34 ` James Bottomley [not found] ` <1469115276.2331.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2016-07-21 15:50 ` Tejun Heo [not found] ` <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> 2016-07-21 18:16 ` James Bottomley [not found] ` <1469125002.2331.54.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2016-07-21 21:06 ` Tejun Heo 2016-07-22 8:30 ` Aleksa Sarai [not found] ` <177bbc17-5c75-1ff8-0b1f-0c5601fa7e6b-l3A5Bk7waGM@public.gmane.org> 2016-07-25 18:38 ` Tejun Heo [not found] ` <20160725183801.GE19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> 2016-07-25 22:54 ` Serge E. Hallyn 2016-07-22 8:24 ` Aleksa Sarai 2016-07-25 18:44 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).