[PATCH v1 0/3] cgroup: allow for unprivileged management

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/3] cgroup: allow for unprivileged management
@ 2016-07-18 16:18 Aleksa Sarai
  2016-07-18 16:18 ` [PATCH v1 1/3] kernfs: add support for custom per-sb permission hooks Aleksa Sarai
       [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 2 replies; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-18 16:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson
  Cc: linux-kernel, cgroups, Christian Brauner, Aleksa Sarai, dev

This is a rewrite of my old cgroup unprivileged subtree management[1]
patchset. Rather than magically creating a new cgroup, I've instead
modified kernfs so that we can have custom permission hooks. The
following only applies to cgroupv2 trees, due to the fact that cgroupv1
doesn't explicitly require that cgroups be hierarchical.

You can only create a new subtree if you either would traditionally have
write access, or you are attempting to create a new cgroup under the
root cgroup of your current cgroup namespace (and you have CAP_SYS_ADMIN
in the user namespace pinned by the cgroup namespace). This means that
users would only be able to create sub-cgroups of their current cgroup
using this method.

In addition, I relaxed one of the ancestor restrictions so that you can
move to direct descendants of the current cgroup without needing to be
able to join the current cgroup you're in (because that restriction
doesn't make much sense).

[1]: http://marc.info/?l=linux-kernel&m=146319604331859

Cc: dev@opencontainers.org

Aleksa Sarai (3):
  kernfs: add support for custom per-sb permission hooks
  cgroup: allow for unprivileged subtree management
  cgroup: relax common ancestor restriction for direct descendants

 fs/kernfs/inode.c      | 13 +++++++-
 include/linux/kernfs.h |  3 ++
 kernel/cgroup.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 93 insertions(+), 9 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v1 1/3] kernfs: add support for custom per-sb permission hooks
  2016-07-18 16:18 [PATCH v1 0/3] cgroup: allow for unprivileged management Aleksa Sarai
@ 2016-07-18 16:18 ` Aleksa Sarai
       [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org>
  1 sibling, 0 replies; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-18 16:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson
  Cc: linux-kernel, cgroups, Christian Brauner, Aleksa Sarai, dev

This allows for users of kernfs to create custom (and possibly less
restrictive) permission checks for kernfs_nodes. The default is
unchanged. This patch is part of the cgroupns unprivileged subtree
management patchset.

Cc: dev@opencontainers.org
Signed-off-by: Aleksa Sarai <asarai@suse.de>
---
 fs/kernfs/inode.c      | 13 ++++++++++++-
 include/linux/kernfs.h |  3 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index 63b925d5ba1e..e82b8e5aa643 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -364,15 +364,26 @@ void kernfs_evict_inode(struct inode *inode)
 int kernfs_iop_permission(struct inode *inode, int mask)
 {
 	struct kernfs_node *kn;
+	struct kernfs_syscall_ops *scops;
+	int ret;
 
 	if (mask & MAY_NOT_BLOCK)
 		return -ECHILD;
 
 	kn = inode->i_private;
+	if (!kernfs_get_active(kn))
+		return -ENODEV;
 
 	mutex_lock(&kernfs_mutex);
 	kernfs_refresh_inode(kn, inode);
 	mutex_unlock(&kernfs_mutex);
 
-	return generic_permission(inode, mask);
+	scops = kernfs_root(kn)->syscall_ops;
+	if (unlikely(scops && scops->permission))
+		ret = scops->permission(inode, kn, mask);
+	else
+		ret = generic_permission(inode, mask);
+
+	kernfs_put_active(kn);
+	return ret;
 }
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 96356ef012de..373b5a888a81 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -16,6 +16,7 @@
 #include <linux/rbtree.h>
 #include <linux/atomic.h>
 #include <linux/wait.h>
+#include <linux/fs.h>
 
 struct file;
 struct dentry;
@@ -154,6 +155,8 @@ struct kernfs_syscall_ops {
 		      const char *new_name);
 	int (*show_path)(struct seq_file *sf, struct kernfs_node *kn,
 			 struct kernfs_root *root);
+	int (*permission)(struct inode *inode, struct kernfs_node *kn,
+			  int mask);
 };
 
 struct kernfs_root {
-- 
2.9.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

[parent not found: <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org>]

* [PATCH v1 2/3] cgroup: allow for unprivileged subtree management
       [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org>
@ 2016-07-18 16:18   ` Aleksa Sarai
       [not found]     ` <20160718161816.13040-3-asarai-l3A5Bk7waGM@public.gmane.org>
  2016-07-18 16:18   ` [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants Aleksa Sarai
  1 sibling, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-18 16:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, Aleksa Sarai,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Use the new custom ->permission hook to allow unprivileged processes to
mkdir new sub-cgroup directories of the root_cset of their current
cgroup namespace. No process outside of the cgroup namespace (or in a
sub-namespace) has this ability, and thus a process must have sufficient
privileges to setns to a cgroup namespace in order to create cgroups in
a cgroup they are not currently residing in.

Only privileged processes in the user namespace pinned to the cgroup
namespace have this new ability. This further restricts any oddness from
happening with the creation of many cgroups which the process cannot
effectively join.

This change only applies to the default hierarchy, as cgroupv1 cgroups
are not necessarily hierarchical (thus allowing the creating of new
sub-cgroups would allow for circumvention of cgroup limits). However,
since cgroupv2 cgroups are strictly hierarchical as a design constraint
this is possible.

It should be noted that cgroupv2 also has attaching restrictions that
make this process safe against two complicit processes from migrating
a process to the less restrictive cgroup of the two.

Cc: dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ@public.gmane.org
Signed-off-by: Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org>
---
 kernel/cgroup.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 8647f3112f5c..4559baa7eabd 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5490,6 +5490,67 @@ static int cgroup_rmdir(struct kernfs_node *kn)
 	return ret;
 }
 
+/*
+ * We have specific rules when deciding if a process can write to a cgroup
+ * directory, based on their current state inside cgroupns.
+ */
+static int cgroup_permission(struct inode *inode, struct kernfs_node *kn,
+			     int mask)
+{
+	int ret;
+	struct cgroup *cgroup;
+	struct cgroup_namespace *cgroupns;
+
+	/*
+	 * First, compute the generic_permission return value. In most cases
+	 * this will succeed and we can also avoid duplicating this code.
+	 */
+
+	cgroup = kn->priv;
+	cgroup_get(cgroup);
+
+	/* First, try the generic method which should work in most cases. */
+	ret = generic_permission(inode, mask);
+
+	/* If the generic check succeeded, then we're all good. */
+	if (!ret)
+		goto out_put_cgroup;
+
+	/* We're only interested in cgroup directories. */
+	if (kernfs_type(kn) != KERNFS_DIR)
+		goto out_put_cgroup;
+
+	/* ... and in may_create() operations only. */
+	if ((mask & (MAY_WRITE | MAY_EXEC)) != (MAY_WRITE | MAY_EXEC))
+		goto out_put_cgroup;
+
+	/*
+	 * This only applies for cgroups on the default hierarchy, as cgroupv1
+	 * was not truly hierarchical this operation was not safe.
+	 */
+	if (!cgroup_on_dfl(cgroup))
+		goto out_put_cgroup;
+
+	cgroupns = current->nsproxy->cgroup_ns;
+	get_cgroup_ns(cgroupns);
+
+	ret = -EPERM;
+	if (cgroupns->root_cset->dfl_cgrp == cgroup) {
+		/*
+		 * Check CAP_SYS_ADMIN, to make sure that unprivileged
+		 * processes inside a cgroup namespace they don't "own" don't
+		 * get any special treatment.
+		 */
+		if (ns_capable(cgroupns->user_ns, CAP_SYS_ADMIN))
+			ret = 0;
+	}
+
+	put_cgroup_ns(cgroupns);
+out_put_cgroup:
+	cgroup_put(cgroup);
+	return ret;
+}
+
 static struct kernfs_syscall_ops cgroup_kf_syscall_ops = {
 	.remount_fs		= cgroup_remount,
 	.show_options		= cgroup_show_options,
@@ -5497,6 +5558,7 @@ static struct kernfs_syscall_ops cgroup_kf_syscall_ops = {
 	.rmdir			= cgroup_rmdir,
 	.rename			= cgroup_rename,
 	.show_path		= cgroup_show_path,
+	.permission		= cgroup_permission,
 };
 
 static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

[parent not found: <20160718161816.13040-3-asarai-l3A5Bk7waGM@public.gmane.org>]

* Re: [PATCH v1 2/3] cgroup: allow for unprivileged subtree management
       [not found]     ` <20160718161816.13040-3-asarai-l3A5Bk7waGM@public.gmane.org>
@ 2016-07-20 15:45       ` Tejun Heo
       [not found]         ` <20160720154557.GF4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-20 15:45 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Hello, Aleksa.

On Tue, Jul 19, 2016 at 02:18:15AM +1000, Aleksa Sarai wrote:
> +static int cgroup_permission(struct inode *inode, struct kernfs_node *kn,
> +			     int mask)
> +{
> +	int ret;
> +	struct cgroup *cgroup;
> +	struct cgroup_namespace *cgroupns;
> +
> +	/*
> +	 * First, compute the generic_permission return value. In most cases
> +	 * this will succeed and we can also avoid duplicating this code.
> +	 */
> +
> +	cgroup = kn->priv;
> +	cgroup_get(cgroup);

This pattern which is repated for cgroupns doesn't make sense.  The
code is already assuming that the cgroup is safe to deref.  Getting
its reference doesn't do anything.  Getting it here would only make
sense if the pointer is passed to an asynchronous context.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160720154557.GF4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 2/3] cgroup: allow for unprivileged subtree management
       [not found]         ` <20160720154557.GF4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2016-07-20 22:59           ` Aleksa Sarai
  0 siblings, 0 replies; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-20 22:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

>> +static int cgroup_permission(struct inode *inode, struct kernfs_node *kn,
>> +			     int mask)
>> +{
>> +	int ret;
>> +	struct cgroup *cgroup;
>> +	struct cgroup_namespace *cgroupns;
>> +
>> +	/*
>> +	 * First, compute the generic_permission return value. In most cases
>> +	 * this will succeed and we can also avoid duplicating this code.
>> +	 */
>> +
>> +	cgroup = kn->priv;
>> +	cgroup_get(cgroup);
>
> This pattern which is repated for cgroupns doesn't make sense.  The
> code is already assuming that the cgroup is safe to deref.  Getting
> its reference doesn't do anything.  Getting it here would only make
> sense if the pointer is passed to an asynchronous context.

I'll send out a fixed patchset once we figure out the 
cgroups_proc_write_permission() stuff.

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org>
  2016-07-18 16:18   ` [PATCH v1 2/3] cgroup: allow for unprivileged subtree management Aleksa Sarai
@ 2016-07-18 16:18   ` Aleksa Sarai
       [not found]     ` <20160718161816.13040-4-asarai-l3A5Bk7waGM@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-18 16:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner, Aleksa Sarai,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

If we're moving from a parent to a direct descendant, the only end
result (on cgroupv2 hierarchies) is that the process experiences more
restrictive resource limits. Thus, there's no reason to restrict
processes from moving to direct descendants based on whether or not they
have cgroup.procs write access to their current cgroup.

This is important for unprivileged subtree management, as it allows
unprivileged processes to move to their newly create subtrees.

Cc: dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ@public.gmane.org
Signed-off-by: Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org>
---
 kernel/cgroup.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4559baa7eabd..fa403357ba91 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2859,14 +2859,22 @@ static int cgroup_procs_write_permission(struct task_struct *task,
 		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
 		spin_unlock_irq(&css_set_lock);
 
-		while (!cgroup_is_descendant(dst_cgrp, cgrp))
-			cgrp = cgroup_parent(cgrp);
-
-		ret = -ENOMEM;
-		inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
-		if (inode) {
-			ret = inode_permission(inode, MAY_WRITE);
-			iput(inode);
+		/*
+		 * If we are moving to a descendant of our current cgroup, we
+		 * can only further restrict the cgroup limits we must follow.
+		 * Thus, it doesn't make sense to restrict the cgroup.procs
+		 * write.
+		 */
+		if (!cgroup_is_descendant(dst_cgrp, cgrp)) {
+			while (!cgroup_is_descendant(dst_cgrp, cgrp))
+				cgrp = cgroup_parent(cgrp);
+
+			ret = -ENOMEM;
+			inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
+			if (inode) {
+				ret = inode_permission(inode, MAY_WRITE);
+				iput(inode);
+			}
 		}
 	}
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

[parent not found: <20160718161816.13040-4-asarai-l3A5Bk7waGM@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]     ` <20160718161816.13040-4-asarai-l3A5Bk7waGM@public.gmane.org>
@ 2016-07-20 15:51       ` Tejun Heo
       [not found]         ` <20160720155147.GG4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-20 15:51 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Hello, Aleksa.

On Tue, Jul 19, 2016 at 02:18:16AM +1000, Aleksa Sarai wrote:
> If we're moving from a parent to a direct descendant, the only end
> result (on cgroupv2 hierarchies) is that the process experiences more
> restrictive resource limits. Thus, there's no reason to restrict
> processes from moving to direct descendants based on whether or not they
> have cgroup.procs write access to their current cgroup.
> 
> This is important for unprivileged subtree management, as it allows
> unprivileged processes to move to their newly create subtrees.

I don't think we can do this as this allows a sub-cgroup to steal an
ancestor's process whether the ancestor likes it or not.  A process
being put in a context where it's more restricted without whatever is
managing that part of cgroup hierarchy is not ok, at all.  Please also
note that nobody expects its processes to be stolen underneath it.
This would be a management nightmare.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160720155147.GG4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]         ` <20160720155147.GG4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2016-07-20 22:58           ` Aleksa Sarai
       [not found]             ` <6e975d80-4077-fb8b-ec84-708e37c8e149-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-20 22:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

>> If we're moving from a parent to a direct descendant, the only end
>> result (on cgroupv2 hierarchies) is that the process experiences more
>> restrictive resource limits. Thus, there's no reason to restrict
>> processes from moving to direct descendants based on whether or not they
>> have cgroup.procs write access to their current cgroup.
>>
>> This is important for unprivileged subtree management, as it allows
>> unprivileged processes to move to their newly create subtrees.
>
> I don't think we can do this as this allows a sub-cgroup to steal an
> ancestor's process whether the ancestor likes it or not.  A process
> being put in a context where it's more restricted without whatever is
> managing that part of cgroup hierarchy is not ok, at all.  Please also
> note that nobody expects its processes to be stolen underneath it.
> This would be a management nightmare.

I'm not sure what you mean by "steal". The user doing the migration owns 
the process, so I would argue that they aren't "stealing" anything. 
While a higher level process might not know where precisely in the 
hierarchy the process is, they'll know it that it must be a sub-cgroup 
of the one they were put in (meaning the parent can still impose 
restrictions without any issue).

If you want, we can make it so that an unprivileged user migrating 
processes to a child cgroup only works if you're in the same cgroup 
namespace (and have CAP_SYS_ADMIN in the pinned user namespace, etc). 
The current setup would obviously still work, but you'd add a permission 
for users that just want to be able to limit their own processes. IIRC 
we need to update cgroup_procs_write_permission() anyway. By having the 
cgroup namespace requirement, you'd definitely have to "own" the process 
in every sense of the word I can imagine.

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <6e975d80-4077-fb8b-ec84-708e37c8e149-l3A5Bk7waGM@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]             ` <6e975d80-4077-fb8b-ec84-708e37c8e149-l3A5Bk7waGM@public.gmane.org>
@ 2016-07-20 23:02               ` Tejun Heo
       [not found]                 ` <20160720230228.GA19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-20 23:02 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

Hello, Aleksa.

On Thu, Jul 21, 2016 at 08:58:32AM +1000, Aleksa Sarai wrote:
> I'm not sure what you mean by "steal". The user doing the migration owns the

In the sense that the ancestor cgroup can be modified by one of its
descendants even when that descendant doesn't have enough permission
to modify the ancestor.

> process, so I would argue that they aren't "stealing" anything. While a
> higher level process might not know where precisely in the hierarchy the
> process is, they'll know it that it must be a sub-cgroup of the one they
> were put in (meaning the parent can still impose restrictions without any
> issue).

Hmmm... it's not just about the ownership of the process itself.  If
it had been, we wouldn't have bothered with permission model on cgroup
hierarchy itself.  It's also about who is allowed to modify a given
cgroup and what you're proposing violates that.

> If you want, we can make it so that an unprivileged user migrating processes
> to a child cgroup only works if you're in the same cgroup namespace (and
> have CAP_SYS_ADMIN in the pinned user namespace, etc). The current setup
> would obviously still work, but you'd add a permission for users that just
> want to be able to limit their own processes. IIRC we need to update
> cgroup_procs_write_permission() anyway. By having the cgroup namespace
> requirement, you'd definitely have to "own" the process in every sense of
> the word I can imagine.

Maybe I'm misunderstanding but I can't see how that would change the
situation in a significant way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160720230228.GA19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                 ` <20160720230228.GA19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2016-07-20 23:18                   ` Aleksa Sarai
       [not found]                     ` <982fcf3a-3685-9bd7-dd95-7bff255c9421-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-20 23:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

>> process, so I would argue that they aren't "stealing" anything. While a
>> higher level process might not know where precisely in the hierarchy the
>> process is, they'll know it that it must be a sub-cgroup of the one they
>> were put in (meaning the parent can still impose restrictions without any
>> issue).
>
> Hmmm... it's not just about the ownership of the process itself.  If
> it had been, we wouldn't have bothered with permission model on cgroup
> hierarchy itself.  It's also about who is allowed to modify a given
> cgroup and what you're proposing violates that.

I feel like the permission model makes sense in certain cases (the 
common ancestor restriction, as well as the ability for a parent to 
apply limits to children by setting its own limits). Neither of those 
are violated (if you read the commit that introduced the common ancestor 
restriction).

Maybe if you give me a usecase of when it might be important that a 
process must not be able to move to a sub-cgroup of its current one, I 
might be able to understand your concerns? From my perspective, I think 
that's actually quite useful.

>> If you want, we can make it so that an unprivileged user migrating processes
>> to a child cgroup only works if you're in the same cgroup namespace (and
>> have CAP_SYS_ADMIN in the pinned user namespace, etc). The current setup
>> would obviously still work, but you'd add a permission for users that just
>> want to be able to limit their own processes. IIRC we need to update
>> cgroup_procs_write_permission() anyway. By having the cgroup namespace
>> requirement, you'd definitely have to "own" the process in every sense of
>> the word I can imagine.
>
> Maybe I'm misunderstanding but I can't see how that would change the
> situation in a significant way.

Well, it would avoid the issue of a process being moved against *its* 
will. The process would have to be complicit in joining (or unsharing) a 
cgroup namespace. I'm not sure I really agree with the argument that a 
higher level process should be able to stop a process from imposing more 
*stringent* limits on itself if the process is complicit in setting 
those limits (see above).

The reason I'm doing this is so that we might be able to _practically_ 
use cgroups as an unprivileged user (something that will almost 
certainly be useful to not just the container crowd, but people also 
planning on using cgroups as advanced forms of rlimits).

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <982fcf3a-3685-9bd7-dd95-7bff255c9421-l3A5Bk7waGM@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                     ` <982fcf3a-3685-9bd7-dd95-7bff255c9421-l3A5Bk7waGM@public.gmane.org>
@ 2016-07-20 23:19                       ` Tejun Heo
       [not found]                         ` <20160720231949.GB19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-20 23:19 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

Hello, Aleksa.

On Thu, Jul 21, 2016 at 09:18:59AM +1000, Aleksa Sarai wrote:
> I feel like the permission model makes sense in certain cases (the common
> ancestor restriction, as well as the ability for a parent to apply limits to
> children by setting its own limits). Neither of those are violated (if you
> read the commit that introduced the common ancestor restriction).
>
> Maybe if you give me a usecase of when it might be important that a process
> must not be able to move to a sub-cgroup of its current one, I might be able
> to understand your concerns? From my perspective, I think that's actually
> quite useful.

cgroup is used to keep track of which processes belong where and
allowing processes to be moved out of its cgroup like this would be
surprising to say the least.

> > Maybe I'm misunderstanding but I can't see how that would change the
> > situation in a significant way.
> 
> Well, it would avoid the issue of a process being moved against *its* will.
> The process would have to be complicit in joining (or unsharing) a cgroup
> namespace. I'm not sure I really agree with the argument that a higher level
> process should be able to stop a process from imposing more *stringent*
> limits on itself if the process is complicit in setting those limits (see
> above).
> 
> The reason I'm doing this is so that we might be able to _practically_ use
> cgroups as an unprivileged user (something that will almost certainly be
> useful to not just the container crowd, but people also planning on using
> cgroups as advanced forms of rlimits).

I don't get why we need this fragile dance with permissions at all
when the same functionality can be achieved by delegating explicitly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160720231949.GB19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                         ` <20160720231949.GB19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2016-07-21  7:49                           ` Aleksa Sarai
       [not found]                             ` <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-21  7:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

>> I feel like the permission model makes sense in certain cases (the common
>> ancestor restriction, as well as the ability for a parent to apply limits to
>> children by setting its own limits). Neither of those are violated (if you
>> read the commit that introduced the common ancestor restriction).
>>
>> Maybe if you give me a usecase of when it might be important that a process
>> must not be able to move to a sub-cgroup of its current one, I might be able
>> to understand your concerns? From my perspective, I think that's actually
>> quite useful.
>
> cgroup is used to keep track of which processes belong where and
> allowing processes to be moved out of its cgroup like this would be
> surprising to say the least.

Would you find it acceptable if we added a bit that would make this not 
happen (you could specify that a cgroup should not allow a process to 
move itself to a sub-cgroup)? Or an aggregate cgroup.procs that gives 
you all of the processes in the entire branch of the tree? Surely this 
is something that can be fixed without unnecessarily restricting users 
from doing useful things.

>> The reason I'm doing this is so that we might be able to _practically_ use
>> cgroups as an unprivileged user (something that will almost certainly be
>> useful to not just the container crowd, but people also planning on using
>> cgroups as advanced forms of rlimits).
>
> I don't get why we need this fragile dance with permissions at all
> when the same functionality can be achieved by delegating explicitly.

The key words being "unprivileged user". Currently, if I am a regular 
user on a system and I want to use the freezer cgroup to pause a process 
I am running, I have to *go to the administrator and ask them to give me 
permission to do that*. Why is that necessary? I find it quite troubling 
that the usecase of an ordinary user on a system trying to use something 
as useful as cgroups is considered to be "solved" by asking your 
administrator (or systemd) to do it for you. "Delegating explicitly" is 
punting on the problem, by saying "just get the administrator to do the 
setup for you". What if you don't have the opportunity to do that, and 
it takes you 4 weeks of sending emails for you to get the administrator 
to do _anything_?

This is something I'm trying to fix with my recent work with rootless 
containers (and quite a few other people are trying to fix it too). 
Currently we just simply can't do certain operations as an unprivileged 
user that would be possible *if we could just use cgroups*. Things like 
the freezer cgroup would be invaluable for containers, and I guarantee 
that the Chromium and Firefox folks would find it useful to be able to 
limit browser processes in a similar way.

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                             ` <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org>
@ 2016-07-21 14:33                               ` Serge E. Hallyn
       [not found]                                 ` <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  2016-07-21 14:52                               ` Tejun Heo
  1 sibling, 1 reply; 34+ messages in thread
From: Serge E. Hallyn @ 2016-07-21 14:33 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Tejun Heo, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

Quoting Aleksa Sarai (asarai-l3A5Bk7waGM@public.gmane.org):
> >>I feel like the permission model makes sense in certain cases (the common
> >>ancestor restriction, as well as the ability for a parent to apply limits to
> >>children by setting its own limits). Neither of those are violated (if you
> >>read the commit that introduced the common ancestor restriction).
> >>
> >>Maybe if you give me a usecase of when it might be important that a process
> >>must not be able to move to a sub-cgroup of its current one, I might be able
> >>to understand your concerns? From my perspective, I think that's actually
> >>quite useful.
> >
> >cgroup is used to keep track of which processes belong where and
> >allowing processes to be moved out of its cgroup like this would be
> >surprising to say the least.
> 
> Would you find it acceptable if we added a bit that would make this
> not happen (you could specify that a cgroup should not allow a
> process to move itself to a sub-cgroup)? Or an aggregate
> cgroup.procs that gives you all of the processes in the entire
> branch of the tree? Surely this is something that can be fixed
> without unnecessarily restricting users from doing useful things.
> 
> >>The reason I'm doing this is so that we might be able to _practically_ use
> >>cgroups as an unprivileged user (something that will almost certainly be
> >>useful to not just the container crowd, but people also planning on using
> >>cgroups as advanced forms of rlimits).
> >
> >I don't get why we need this fragile dance with permissions at all
> >when the same functionality can be achieved by delegating explicitly.
> 
> The key words being "unprivileged user". Currently, if I am a
> regular user on a system and I want to use the freezer cgroup to
> pause a process I am running, I have to *go to the administrator and
> ask them to give me permission to do that*. Why is that necessary? I

Ths is of course solvable using something like libpam-cgfs or
libpam-cgm (and others).  Since this sounds like a question of
policy, not mechanism, userspace seems like the right place.  Is
there a downside to that (or, as Tejun put it, "delegating explicitly")?

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                 ` <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2016-07-21 14:37                                   ` Aleksa Sarai
       [not found]                                     ` <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org>
  2016-07-21 14:51                                   ` James Bottomley
  1 sibling, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-21 14:37 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Tejun Heo, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

>>>> I feel like the permission model makes sense in certain cases (the common
>>>> ancestor restriction, as well as the ability for a parent to apply limits to
>>>> children by setting its own limits). Neither of those are violated (if you
>>>> read the commit that introduced the common ancestor restriction).
>>>>
>>>> Maybe if you give me a usecase of when it might be important that a process
>>>> must not be able to move to a sub-cgroup of its current one, I might be able
>>>> to understand your concerns? From my perspective, I think that's actually
>>>> quite useful.
>>>
>>> cgroup is used to keep track of which processes belong where and
>>> allowing processes to be moved out of its cgroup like this would be
>>> surprising to say the least.
>>
>> Would you find it acceptable if we added a bit that would make this
>> not happen (you could specify that a cgroup should not allow a
>> process to move itself to a sub-cgroup)? Or an aggregate
>> cgroup.procs that gives you all of the processes in the entire
>> branch of the tree? Surely this is something that can be fixed
>> without unnecessarily restricting users from doing useful things.
>>
>>>> The reason I'm doing this is so that we might be able to _practically_ use
>>>> cgroups as an unprivileged user (something that will almost certainly be
>>>> useful to not just the container crowd, but people also planning on using
>>>> cgroups as advanced forms of rlimits).
>>>
>>> I don't get why we need this fragile dance with permissions at all
>>> when the same functionality can be achieved by delegating explicitly.
>>
>> The key words being "unprivileged user". Currently, if I am a
>> regular user on a system and I want to use the freezer cgroup to
>> pause a process I am running, I have to *go to the administrator and
>> ask them to give me permission to do that*. Why is that necessary? I
>
> Ths is of course solvable using something like libpam-cgfs or
> libpam-cgm (and others).  Since this sounds like a question of
> policy, not mechanism, userspace seems like the right place.  Is
> there a downside to that (or, as Tejun put it, "delegating explicitly")?

Having a PAM module requires getting an administrator to install the PAM 
module (and also presumably audit it, not to mention convincing them 
that your requirement to use containers are significant enough for them 
to do any work). It's the same problem IMO. I understand that LXC allows 
you to do this, but it requires that you get an administrator to 
*install* and support LXC (as well as the shadow-utils setuid binaries 
too). There are cases where you don't have the freedom to do that, and 
also "just get someone to give you privileges temporarily" is again 
punting on the problem.

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                     ` <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org>
@ 2016-07-21 15:01                                       ` Tejun Heo
  2016-07-21 15:09                                       ` Serge E. Hallyn
  1 sibling, 0 replies; 34+ messages in thread
From: Tejun Heo @ 2016-07-21 15:01 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Serge E. Hallyn, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

Hello, Aleksa.

On Fri, Jul 22, 2016 at 12:37:42AM +1000, Aleksa Sarai wrote:
> > Ths is of course solvable using something like libpam-cgfs or
> > libpam-cgm (and others).  Since this sounds like a question of
> > policy, not mechanism, userspace seems like the right place.  Is
> > there a downside to that (or, as Tejun put it, "delegating explicitly")?
> 
> Having a PAM module requires getting an administrator to install the PAM
> module (and also presumably audit it, not to mention convincing them that
> your requirement to use containers are significant enough for them to do any
> work). It's the same problem IMO. I understand that LXC allows you to do
> this, but it requires that you get an administrator to *install* and support
> LXC (as well as the shadow-utils setuid binaries too). There are cases where
> you don't have the freedom to do that, and also "just get someone to give
> you privileges temporarily" is again punting on the problem.

The administrator has to install a new kernel to get this feature from
kernel side too.  I don't think "to bypass admin" is a strong argument
for a new kernel feature especially when it's likely to cause subtle
issues as in this case.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                     ` <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org>
  2016-07-21 15:01                                       ` Tejun Heo
@ 2016-07-21 15:09                                       ` Serge E. Hallyn
  1 sibling, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2016-07-21 15:09 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Serge E. Hallyn, Tejun Heo, Greg Kroah-Hartman, Li Zefan,
	Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

Quoting Aleksa Sarai (asarai-l3A5Bk7waGM@public.gmane.org):
> >>>>I feel like the permission model makes sense in certain cases (the common
> >>>>ancestor restriction, as well as the ability for a parent to apply limits to
> >>>>children by setting its own limits). Neither of those are violated (if you
> >>>>read the commit that introduced the common ancestor restriction).
> >>>>
> >>>>Maybe if you give me a usecase of when it might be important that a process
> >>>>must not be able to move to a sub-cgroup of its current one, I might be able
> >>>>to understand your concerns? From my perspective, I think that's actually
> >>>>quite useful.
> >>>
> >>>cgroup is used to keep track of which processes belong where and
> >>>allowing processes to be moved out of its cgroup like this would be
> >>>surprising to say the least.
> >>
> >>Would you find it acceptable if we added a bit that would make this
> >>not happen (you could specify that a cgroup should not allow a
> >>process to move itself to a sub-cgroup)? Or an aggregate
> >>cgroup.procs that gives you all of the processes in the entire
> >>branch of the tree? Surely this is something that can be fixed
> >>without unnecessarily restricting users from doing useful things.
> >>
> >>>>The reason I'm doing this is so that we might be able to _practically_ use
> >>>>cgroups as an unprivileged user (something that will almost certainly be
> >>>>useful to not just the container crowd, but people also planning on using
> >>>>cgroups as advanced forms of rlimits).
> >>>
> >>>I don't get why we need this fragile dance with permissions at all
> >>>when the same functionality can be achieved by delegating explicitly.
> >>
> >>The key words being "unprivileged user". Currently, if I am a
> >>regular user on a system and I want to use the freezer cgroup to
> >>pause a process I am running, I have to *go to the administrator and
> >>ask them to give me permission to do that*. Why is that necessary? I
> >
> >Ths is of course solvable using something like libpam-cgfs or
> >libpam-cgm (and others).  Since this sounds like a question of
> >policy, not mechanism, userspace seems like the right place.  Is
> >there a downside to that (or, as Tejun put it, "delegating explicitly")?
> 
> Having a PAM module requires getting an administrator to install the
> PAM module (and also presumably audit it, not to mention convincing
> them that your requirement to use containers are significant enough

Right, but that's also the upside.  Just like user namespaces, it *is*
possible that there remain exploitable situations when cgroups are
delegated, and it is up to the admin, not the user, to gauge how
averse they are to that risk.

(Fwiw obviously I am very sympathetic to your goals :)

> for them to do any work). It's the same problem IMO. I understand
> that LXC allows you to do this, but it requires that you get an
> administrator to *install* and support LXC (as well as the
> shadow-utils setuid binaries too). There are cases where you don't
> have the freedom to do that, and also "just get someone to give you
> privileges temporarily" is again punting on the problem.
> 
> -- 
> Aleksa Sarai
> Software Engineer (Containers)
> SUSE Linux GmbH
> https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                 ` <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  2016-07-21 14:37                                   ` Aleksa Sarai
@ 2016-07-21 14:51                                   ` James Bottomley
       [not found]                                     ` <1469112709.2331.11.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: James Bottomley @ 2016-07-21 14:51 UTC (permalink / raw)
  To: Serge E. Hallyn, Aleksa Sarai
  Cc: Tejun Heo, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

On Thu, 2016-07-21 at 09:33 -0500, Serge E. Hallyn wrote:
> Quoting Aleksa Sarai (asarai-l3A5Bk7waGM@public.gmane.org):

> > > > The reason I'm doing this is so that we might be able to
> > > > _practically_ use cgroups as an unprivileged user (something
> > > > that will almost certainly be useful to not just the container
> > > > crowd, but people also planning on using cgroups as advanced
> > > > forms of rlimits).
> > > 
> > > I don't get why we need this fragile dance with permissions at 
> > > all when the same functionality can be achieved by delegating
> > > explicitly.
> > 
> > The key words being "unprivileged user". Currently, if I am a
> > regular user on a system and I want to use the freezer cgroup to
> > pause a process I am running, I have to *go to the administrator 
> > and ask them to give me permission to do that*. Why is that
> > necessary?
> 
> Ths is of course solvable using something like libpam-cgfs or
> libpam-cgm (and others).  Since this sounds like a question of
> policy, not mechanism, userspace seems like the right place.  Is
> there a downside to that (or, as Tejun put it, "delegating 
> explicitly")?

Unprivileged containers should "just work" by default as much as
possible.  There are cases where they can't and policy input is
required, like the userns mapping to additional ids beyond the current
one, we can still set up the default case without intervention (a
single mapping root to current id).

What I haven't really heard yet in the debate is the policy reason why
an unprivileged user shouldn't set up their own cgroups as children of
the current ones (inheriting the constraints).

I have heard

 * it would give power to move other tasks to more rigid constraints. 
    To which the answer is only to allow movememnt of tasks in the
   current cgroupns
 * It violates the permissions delegation model.  This one doesn't
   really make too much sense to me: in the same way the userns is root
   in its own domain, cgroups ns is effective root for the restricted
   cgroups (and only for processes within its ns).

Perhaps the question should be asked the other way around:  if we were
explicitly delegating permission to every user in the system to set up
their own sub cgroups, how would you advise it be done?

James

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <1469112709.2331.11.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                     ` <1469112709.2331.11.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-07-21 14:59                                       ` Tejun Heo
       [not found]                                         ` <20160721145905.GC22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-21 14:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: Serge E. Hallyn, Aleksa Sarai, Greg Kroah-Hartman, Li Zefan,
	Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Hello, James.

On Thu, Jul 21, 2016 at 07:51:49AM -0700, James Bottomley wrote:
> What I haven't really heard yet in the debate is the policy reason why
> an unprivileged user shouldn't set up their own cgroups as children of
> the current ones (inheriting the constraints).

It's not even about policies.  The interface just can't support
operations like this in a robust way.  We can try to hack enough holes
at it to make some scenarios work but it's all but guaranteed that
such approach is gonna cause painful long-term issues.

> I have heard
> 
>  * it would give power to move other tasks to more rigid constraints. 
>     To which the answer is only to allow movememnt of tasks in the
>    current cgroupns
>  * It violates the permissions delegation model.  This one doesn't
>    really make too much sense to me: in the same way the userns is root
>    in its own domain, cgroups ns is effective root for the restricted
>    cgroups (and only for processes within its ns).
> 
> Perhaps the question should be asked the other way around:  if we were
> explicitly delegating permission to every user in the system to set up
> their own sub cgroups, how would you advise it be done?

Coordinate in userspace.  Request whatever is managing the cgroup
hierarchy to set up delegation.  It's not like permission model is
fully contained in kernel on modern systems anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160721145905.GC22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                         ` <20160721145905.GC22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2016-07-21 15:07                                           ` Aleksa Sarai
  2016-07-21 15:04                                             ` Tejun Heo
  0 siblings, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-21 15:07 UTC (permalink / raw)
  To: Tejun Heo, James Bottomley
  Cc: Serge E. Hallyn, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

>> I have heard
>>
>>  * it would give power to move other tasks to more rigid constraints.
>>     To which the answer is only to allow movememnt of tasks in the
>>    current cgroupns
>>  * It violates the permissions delegation model.  This one doesn't
>>    really make too much sense to me: in the same way the userns is root
>>    in its own domain, cgroups ns is effective root for the restricted
>>    cgroups (and only for processes within its ns).
>>
>> Perhaps the question should be asked the other way around:  if we were
>> explicitly delegating permission to every user in the system to set up
>> their own sub cgroups, how would you advise it be done?
>
> Coordinate in userspace.  Request whatever is managing the cgroup
> hierarchy to set up delegation.  It's not like permission model is
> fully contained in kernel on modern systems anyway.

My experience with certain systemdaemons' cgroup handling doesn't 
inspire confidence :/ (from the runC side, we've had nothing but 
issues). Also, how do you even boot into a cgroupv2 system with systemd 
(I started backporting patches to openSUSE, but it's still not booting)?

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
  2016-07-21 15:07                                           ` Aleksa Sarai
@ 2016-07-21 15:04                                             ` Tejun Heo
  0 siblings, 0 replies; 34+ messages in thread
From: Tejun Heo @ 2016-07-21 15:04 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: James Bottomley, Serge E. Hallyn, Greg Kroah-Hartman, Li Zefan,
	Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel, cgroups, Christian Brauner, dev

Hello, Aleksa.

On Fri, Jul 22, 2016 at 01:07:13AM +1000, Aleksa Sarai wrote:
> > Coordinate in userspace.  Request whatever is managing the cgroup
> > hierarchy to set up delegation.  It's not like permission model is
> > fully contained in kernel on modern systems anyway.
> 
> My experience with certain systemdaemons' cgroup handling doesn't inspire
> confidence :/ (from the runC side, we've had nothing but issues). Also, how

Fix it then.  Working around bugs in userland isn't a justifiable
rationale for adding new kernel features.

> do you even boot into a cgroupv2 system with systemd (I started backporting
> patches to openSUSE, but it's still not booting)?

With devel branch, just passing in the unified hierarchy boot param
seems to work fine here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                             ` <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org>
  2016-07-21 14:33                               ` Serge E. Hallyn
@ 2016-07-21 14:52                               ` Tejun Heo
       [not found]                                 ` <20160721145242.GB22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-21 14:52 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ, James Bottomley

Hello, Aleksa.

On Thu, Jul 21, 2016 at 05:49:36PM +1000, Aleksa Sarai wrote:
> > > The reason I'm doing this is so that we might be able to _practically_ use
> > > cgroups as an unprivileged user (something that will almost certainly be
> > > useful to not just the container crowd, but people also planning on using
> > > cgroups as advanced forms of rlimits).
> > 
> > I don't get why we need this fragile dance with permissions at all
> > when the same functionality can be achieved by delegating explicitly.
> 
> The key words being "unprivileged user". Currently, if I am a regular user
> on a system and I want to use the freezer cgroup to pause a process I am
> running, I have to *go to the administrator and ask them to give me
> permission to do that*. Why is that necessary? I find it quite troubling
> that the usecase of an ordinary user on a system trying to use something as
> useful as cgroups is considered to be "solved" by asking your administrator
> (or systemd) to do it for you. "Delegating explicitly" is punting on the
> problem, by saying "just get the administrator to do the setup for you".
> What if you don't have the opportunity to do that, and it takes you 4 weeks
> of sending emails for you to get the administrator to do _anything_?
> 
> This is something I'm trying to fix with my recent work with rootless
> containers (and quite a few other people are trying to fix it too).
> Currently we just simply can't do certain operations as an unprivileged user
> that would be possible *if we could just use cgroups*. Things like the
> freezer cgroup would be invaluable for containers, and I guarantee that the
> Chromium and Firefox folks would find it useful to be able to limit browser
> processes in a similar way.

I understand what you're trying to achieve but don't think cgroup's
filesystem interface can accomodate that.  To support that level of
automatic delegation, the API should be providing enough isolation so
that operations in one domain (user-specific operations) are
transparent from the other (system-wide administration), which simply
isn't true for cgroupfs.  As a simple example, imagine a process being
moved to another cgroup racing against the special operations you're
describing ahead.  Both sides are multi-step operations and there are
no ways of synchronizing against each other from kernel side and the
outcomes can easily be non-sensical.

It is unfortunate but we started with and are bound to carry the
current vfs based interface which was never designed to support the
use cases you're describing in a seamless way and that's why cgroup
supports explicit delegation so that userland can take over the
necessary coordination and implement more complex operations atop.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160721145242.GB22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                 ` <20160721145242.GB22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2016-07-21 15:04                                   ` James Bottomley
       [not found]                                     ` <1469113456.2331.16.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: James Bottomley @ 2016-07-21 15:04 UTC (permalink / raw)
  To: Tejun Heo, Aleksa Sarai
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

On Thu, 2016-07-21 at 10:52 -0400, Tejun Heo wrote:
> Hello, Aleksa.
> 
> On Thu, Jul 21, 2016 at 05:49:36PM +1000, Aleksa Sarai wrote:
> > > > The reason I'm doing this is so that we might be able to 
> > > > _practically_ use cgroups as an unprivileged user (something 
> > > > that will almost certainly be useful to not just the container 
> > > > crowd, but people also planning on using cgroups as advanced
> > > > forms of rlimits).
> > > 
> > > I don't get why we need this fragile dance with permissions at 
> > > all when the same functionality can be achieved by delegating
> > > explicitly.
> > 
> > The key words being "unprivileged user". Currently, if I am a 
> > regular user on a system and I want to use the freezer cgroup to 
> > pause a process I am running, I have to *go to the administrator 
> > and ask them to give me permission to do that*. Why is that 
> > necessary? I find it quite troubling that the usecase of an 
> > ordinary user on a system trying to use something as useful as 
> > cgroups is considered to be "solved" by asking your administrator
> > (or systemd) to do it for you. "Delegating explicitly" is punting 
> > on the problem, by saying "just get the administrator to do the 
> > setup for you". What if you don't have the opportunity to do that, 
> > and it takes you 4 weeks of sending emails for you to get the 
> > administrator to do _anything_?
> > 
> > This is something I'm trying to fix with my recent work with 
> > rootless containers (and quite a few other people are trying to fix 
> > it too). Currently we just simply can't do certain operations as an
> > unprivileged user that would be possible *if we could just use 
> > cgroups*. Things like the freezer cgroup would be invaluable for 
> > containers, and I guarantee that the Chromium and Firefox folks 
> > would find it useful to be able to limit browser processes in a
> > similar way.
> 
> I understand what you're trying to achieve but don't think cgroup's
> filesystem interface can accomodate that.  To support that level of
> automatic delegation, the API should be providing enough isolation so
> that operations in one domain (user-specific operations) are
> transparent from the other (system-wide administration), which simply
> isn't true for cgroupfs.  As a simple example, imagine a process 
> being moved to another cgroup racing against the special operations 
> you're describing ahead.  Both sides are multi-step operations and 
> there are no ways of synchronizing against each other from kernel 
> side and the outcomes can easily be non-sensical.

So if I understand, it's not about actually moving the tasks: echoing
the pid to the tasks file is atomic and we can mediate races there. 
 It's about the debris left behind if the admin (or someone with
delegated authority) moves the task to a wholly different cgroup.

Now we have a cgroup directory in the old cgroup, which the current
task has been removed from, for which the current user has permissions
and could then move the task back to.  Is that the essence of the
problem?

James

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <1469113456.2331.16.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                     ` <1469113456.2331.16.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-07-21 15:07                                       ` Tejun Heo
       [not found]                                         ` <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-21 15:07 UTC (permalink / raw)
  To: James Bottomley
  Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Hello, James.

On Thu, Jul 21, 2016 at 08:04:16AM -0700, James Bottomley wrote:
> > I understand what you're trying to achieve but don't think cgroup's
> > filesystem interface can accomodate that.  To support that level of
> > automatic delegation, the API should be providing enough isolation so
> > that operations in one domain (user-specific operations) are
> > transparent from the other (system-wide administration), which simply
> > isn't true for cgroupfs.  As a simple example, imagine a process 
> > being moved to another cgroup racing against the special operations 
> > you're describing ahead.  Both sides are multi-step operations and 
> > there are no ways of synchronizing against each other from kernel 
> > side and the outcomes can easily be non-sensical.
> 
> So if I understand, it's not about actually moving the tasks: echoing
> the pid to the tasks file is atomic and we can mediate races there. 

Yeah, each operation is atomic but most meaningul operations are
multi-step.

>  It's about the debris left behind if the admin (or someone with
> delegated authority) moves the task to a wholly different cgroup.
> 
> Now we have a cgroup directory in the old cgroup, which the current
> task has been removed from, for which the current user has permissions
> and could then move the task back to.  Is that the essence of the
> problem?

That'd be one side.  The other side is the one moving.  Let's say the
system admin thing wants to move all processe from A proper to B.  It
would do that by draining processes from A's procs file into B's and
even that is multistep and can race.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                         ` <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2016-07-21 15:16                                           ` James Bottomley
  2016-07-21 15:26                                             ` Tejun Heo
  2016-07-22  8:24                                           ` Aleksa Sarai
  1 sibling, 1 reply; 34+ messages in thread
From: James Bottomley @ 2016-07-21 15:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

On Thu, 2016-07-21 at 11:07 -0400, Tejun Heo wrote:
> Hello, James.
> 
> On Thu, Jul 21, 2016 at 08:04:16AM -0700, James Bottomley wrote:
> > > I understand what you're trying to achieve but don't think 
> > > cgroup's filesystem interface can accomodate that.  To support 
> > > that level of automatic delegation, the API should be providing 
> > > enough isolation so that operations in one domain (user-specific 
> > > operations) are transparent from the other (system-wide 
> > > administration), which simply isn't true for cgroupfs.  As a 
> > > simple example, imagine a process being moved to another cgroup 
> > > racing against the special operations you're describing ahead. 
> > >  Both sides are multi-step operations and there are no ways of 
> > > synchronizing against each other from kernel side and the
> > > outcomes can easily be non-sensical.
> > 
> > So if I understand, it's not about actually moving the tasks: 
> > echoing the pid to the tasks file is atomic and we can mediate
> > races there.
 
> 
> Yeah, each operation is atomic but most meaningul operations are
> multi-step.
> 
> >  It's about the debris left behind if the admin (or someone with
> > delegated authority) moves the task to a wholly different cgroup.
> > 
> > Now we have a cgroup directory in the old cgroup, which the current
> > task has been removed from, for which the current user has 
> > permissions and could then move the task back to.  Is that the 
> > essence of the problem?
> 
> That'd be one side.  The other side is the one moving.  Let's say the
> system admin thing wants to move all processe from A proper to B.  It
> would do that by draining processes from A's procs file into B's and
> even that is multistep and can race.

So the second part is that once we allow the creation of
subdirectories, there's no unified tasks file, so there's no way of
draining A proper without enumerating and descending into the cgroupns
created subtrees in A?

James

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
  2016-07-21 15:16                                           ` James Bottomley
@ 2016-07-21 15:26                                             ` Tejun Heo
       [not found]                                               ` <20160721152648.GA23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-21 15:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel, cgroups,
	Christian Brauner, dev

Hello, James.

On Thu, Jul 21, 2016 at 08:16:34AM -0700, James Bottomley wrote:
> > That'd be one side.  The other side is the one moving.  Let's say the
> > system admin thing wants to move all processe from A proper to B.  It
> > would do that by draining processes from A's procs file into B's and
> > even that is multistep and can race.
> 
> So the second part is that once we allow the creation of
> subdirectories, there's no unified tasks file, so there's no way of
> draining A proper without enumerating and descending into the cgroupns
> created subtrees in A?

Not that.  If it races, it will end up moving processes which are no
longer in A proper.  Such operations or distinctions might not be
meaningful under many circumstances but it'd be a pretty big hole in
the API to create and I can't really declare these holes are gonna be
okay with confidence.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160721152648.GA23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                               ` <20160721152648.GA23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2016-07-21 15:34                                                 ` James Bottomley
       [not found]                                                   ` <1469115276.2331.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: James Bottomley @ 2016-07-21 15:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

On Thu, 2016-07-21 at 11:26 -0400, Tejun Heo wrote:
> Hello, James.
> 
> On Thu, Jul 21, 2016 at 08:16:34AM -0700, James Bottomley wrote:
> > > That'd be one side.  The other side is the one moving.  Let's say 
> > > the system admin thing wants to move all processe from A proper 
> > > to B.   It would do that by draining processes from A's procs 
> > > file into B's and even that is multistep and can race.
> > 
> > So the second part is that once we allow the creation of
> > subdirectories, there's no unified tasks file, so there's no way of
> > draining A proper without enumerating and descending into the 
> > cgroupns created subtrees in A?
> 
> Not that.  If it races, it will end up moving processes which are no
> longer in A proper.

So if I as the cgroup ns owner am moving a task from A to A_subdir, the
admin scanning tasks in all of A may miss this task in motion because
all the tasks files can't be scanned atomically?

James

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <1469115276.2331.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                                   ` <1469115276.2331.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-07-21 15:50                                                     ` Tejun Heo
       [not found]                                                       ` <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
  2016-07-22  8:30                                                       ` Aleksa Sarai
  0 siblings, 2 replies; 34+ messages in thread
From: Tejun Heo @ 2016-07-21 15:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Hello, James.

On Thu, Jul 21, 2016 at 08:34:36AM -0700, James Bottomley wrote:
> So if I as the cgroup ns owner am moving a task from A to A_subdir, the
> admin scanning tasks in all of A may miss this task in motion because
> all the tasks files can't be scanned atomically?

So, the admin just wants to move processes from A and only A to B.  It
doesn't wanna interfere with processes in the subdirs or on-going ns
operations, but if the race occurs, both A -> B migration and ns
subdir operation would succeed and the end result would be something
neither expects.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                                       ` <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2016-07-21 18:16                                                         ` James Bottomley
       [not found]                                                           ` <1469125002.2331.54.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: James Bottomley @ 2016-07-21 18:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

On Thu, 2016-07-21 at 11:50 -0400, Tejun Heo wrote:
> Hello, James.
> 
> On Thu, Jul 21, 2016 at 08:34:36AM -0700, James Bottomley wrote:
> > So if I as the cgroup ns owner am moving a task from A to A_subdir, 
> > the admin scanning tasks in all of A may miss this task in motion
> > because all the tasks files can't be scanned atomically?
> 
> So, the admin just wants to move processes from A and only A to B. 
>  It doesn't wanna interfere with processes in the subdirs or on-going 
> ns operations, but if the race occurs, both A -> B migration and ns
> subdir operation would succeed and the end result would be something
> neither expects.

OK so a theoretical (not saying it's implementable, we'll have to
explore that) way of fixing all of this is to have separate views of
the tree.  If the admin always saw everything in A, even if the
cgroupns had created subdirectories in its own namespace.  That way
there'd be no race ever in the admin's view (because it's the view they
created and would expect to see).  All sub cgroup activity would only
be visible to tasks in the new cgroupns (we'd probably have to have
them make this visible by mounting a new cgroup tree).

James




^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <1469125002.2331.54.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                                           ` <1469125002.2331.54.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2016-07-21 21:06                                                             ` Tejun Heo
  0 siblings, 0 replies; 34+ messages in thread
From: Tejun Heo @ 2016-07-21 21:06 UTC (permalink / raw)
  To: James Bottomley
  Cc: Aleksa Sarai, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Hello,

On Thu, Jul 21, 2016 at 11:16:42AM -0700, James Bottomley wrote:
> OK so a theoretical (not saying it's implementable, we'll have to
> explore that) way of fixing all of this is to have separate views of
> the tree.  If the admin always saw everything in A, even if the
> cgroupns had created subdirectories in its own namespace.  That way
> there'd be no race ever in the admin's view (because it's the view they
> created and would expect to see).  All sub cgroup activity would only
> be visible to tasks in the new cgroupns (we'd probably have to have
> them make this visible by mounting a new cgroup tree).

Yeah, something like that.  The two domains of operation need to be
transparent to each other so that things taking place at system level
doesn't interfere with user level operations and vice-versa.  It's
likely that implementing something like that within filesystem based
interface won't work out too well.  There are too many expected
behaviors from being a filesystem which don't quite agree with such
abstraction.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
  2016-07-21 15:50                                                     ` Tejun Heo
       [not found]                                                       ` <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2016-07-22  8:30                                                       ` Aleksa Sarai
       [not found]                                                         ` <177bbc17-5c75-1ff8-0b1f-0c5601fa7e6b-l3A5Bk7waGM@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-22  8:30 UTC (permalink / raw)
  To: Tejun Heo, James Bottomley
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel, cgroups,
	Christian Brauner, dev

>> So if I as the cgroup ns owner am moving a task from A to A_subdir, the
>> admin scanning tasks in all of A may miss this task in motion because
>> all the tasks files can't be scanned atomically?
>
> So, the admin just wants to move processes from A and only A to B.  It
> doesn't wanna interfere with processes in the subdirs or on-going ns
> operations, but if the race occurs, both A -> B migration and ns
> subdir operation would succeed and the end result would be something
> neither expects.

Just to be clear, the "ns subdir operation" is a cgroup namespaced 
process moving A -> A_subdir which is racing against some administrative 
process moving everything from A -> B (but not wanting to move A -> 
A_subdir)?

So should there be policy within the kernel to not permit a process 
outside a cgroup namespace to move processes inside the namespace? Or 
would you be concerned about people escaping the administrator's 
attempts to reorganise the hierarchy?

What if we extended rename(2) so that it /does/ allow for reorganisation 
of the hierarchy? So an administrator could use rename to change the 
point at which a cgroupns root is rooted at, but not be able to move the 
actual processes within the cgroup namespace around? The administrator 
could also join the cgroupns (without needing to join the userns) and 
then just move things around that way?

Do any of those suggestions seem reasonable?

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <177bbc17-5c75-1ff8-0b1f-0c5601fa7e6b-l3A5Bk7waGM@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                                         ` <177bbc17-5c75-1ff8-0b1f-0c5601fa7e6b-l3A5Bk7waGM@public.gmane.org>
@ 2016-07-25 18:38                                                           ` Tejun Heo
       [not found]                                                             ` <20160725183801.GE19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2016-07-25 18:38 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: James Bottomley, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Hello, Aleksa.

On Fri, Jul 22, 2016 at 06:30:07PM +1000, Aleksa Sarai wrote:
> Just to be clear, the "ns subdir operation" is a cgroup namespaced process
> moving A -> A_subdir which is racing against some administrative process
> moving everything from A -> B (but not wanting to move A -> A_subdir)?

Yes.

> So should there be policy within the kernel to not permit a process outside
> a cgroup namespace to move processes inside the namespace? Or would you be
> concerned about people escaping the administrator's attempts to reorganise
> the hierarchy?

Pushed that far, I frankly can't assess what the implications and
side-effects would be.

> What if we extended rename(2) so that it /does/ allow for reorganisation of
> the hierarchy? So an administrator could use rename to change the point at
> which a cgroupns root is rooted at, but not be able to move the actual
> processes within the cgroup namespace around? The administrator could also
> join the cgroupns (without needing to join the userns) and then just move
> things around that way?
> 
> Do any of those suggestions seem reasonable?

Unfortunately not.  I get what you're trying to do and am sure we can
make some specific scenarios work with the right set of hacks and
holes, but this type of approach is very dangerous in the long term.

The downside we have now is that we need an explicit delegation from
userland and that stems from the architectural constraints of
cgroupfs.  It's not ideal but an acceptable situation.  Let's please
not riddle the whole thing with holes that we don't understand for an
inconvenience which can be worked around otherwise.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <20160725183801.GE19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                                             ` <20160725183801.GE19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2016-07-25 22:54                                                               ` Serge E. Hallyn
  0 siblings, 0 replies; 34+ messages in thread
From: Serge E. Hallyn @ 2016-07-25 22:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aleksa Sarai, James Bottomley, Greg Kroah-Hartman, Li Zefan,
	Johannes Weiner, Serge E. Hallyn, Aditya Kali, Chris Wilson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> Hello, Aleksa.
> 
> On Fri, Jul 22, 2016 at 06:30:07PM +1000, Aleksa Sarai wrote:
> > Just to be clear, the "ns subdir operation" is a cgroup namespaced process
> > moving A -> A_subdir which is racing against some administrative process
> > moving everything from A -> B (but not wanting to move A -> A_subdir)?
> 
> Yes.
> 
> > So should there be policy within the kernel to not permit a process outside
> > a cgroup namespace to move processes inside the namespace? Or would you be
> > concerned about people escaping the administrator's attempts to reorganise
> > the hierarchy?
> 
> Pushed that far, I frankly can't assess what the implications and
> side-effects would be.
> 
> > What if we extended rename(2) so that it /does/ allow for reorganisation of
> > the hierarchy? So an administrator could use rename to change the point at
> > which a cgroupns root is rooted at, but not be able to move the actual
> > processes within the cgroup namespace around? The administrator could also
> > join the cgroupns (without needing to join the userns) and then just move
> > things around that way?
> > 
> > Do any of those suggestions seem reasonable?
> 
> Unfortunately not.  I get what you're trying to do and am sure we can
> make some specific scenarios work with the right set of hacks and
> holes, but this type of approach is very dangerous in the long term.
> 
> The downside we have now is that we need an explicit delegation from
> userland and that stems from the architectural constraints of
> cgroupfs.  It's not ideal but an acceptable situation.  Let's please
> not riddle the whole thing with holes that we don't understand for an
> inconvenience which can be worked around otherwise.

It's something which distros, post-machine-install scripts, and
 post-pkg-install scripts can all very easily setup.  They have to
setup subuid and subgid too if you want safe unprivileged containers,
so I really don't feel this is so arduous.  What is the use case where
none of these are an option?  (I don't doubt that it exists since you
are pushing pretty hard)

-serge

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
       [not found]                                         ` <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
  2016-07-21 15:16                                           ` James Bottomley
@ 2016-07-22  8:24                                           ` Aleksa Sarai
  2016-07-25 18:44                                             ` Tejun Heo
  1 sibling, 1 reply; 34+ messages in thread
From: Aleksa Sarai @ 2016-07-22  8:24 UTC (permalink / raw)
  To: Tejun Heo, James Bottomley
  Cc: Greg Kroah-Hartman, Li Zefan, Johannes Weiner, Serge E. Hallyn,
	Aditya Kali, Chris Wilson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Christian Brauner,
	dev-IGmTWi+3HBZvNhPySn5qfx2eb7JE58TQ

>>  It's about the debris left behind if the admin (or someone with
>> delegated authority) moves the task to a wholly different cgroup.
>>
>> Now we have a cgroup directory in the old cgroup, which the current
>> task has been removed from, for which the current user has permissions
>> and could then move the task back to.  Is that the essence of the
>> problem?
>
> That'd be one side.  The other side is the one moving.  Let's say the
> system admin thing wants to move all processe from A proper to B.  It
> would do that by draining processes from A's procs file into B's and
> even that is multistep and can race.

Once freezer is ported, wouldn't that allow you to stop the processes so 
you can drain them? I understand your concern with draining, but surely 
the same races occur if you fork? How many times would you need to scan 
cgroup.procs to make sure that you didn't miss anything (and if there's 
enough processes then cgroup.procs reads aren't atomic either).

-- 
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants
  2016-07-22  8:24                                           ` Aleksa Sarai
@ 2016-07-25 18:44                                             ` Tejun Heo
  0 siblings, 0 replies; 34+ messages in thread
From: Tejun Heo @ 2016-07-25 18:44 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: James Bottomley, Greg Kroah-Hartman, Li Zefan, Johannes Weiner,
	Serge E. Hallyn, Aditya Kali, Chris Wilson, linux-kernel, cgroups,
	Christian Brauner, dev

On Fri, Jul 22, 2016 at 06:24:25PM +1000, Aleksa Sarai wrote:
> > >  It's about the debris left behind if the admin (or someone with
> > > delegated authority) moves the task to a wholly different cgroup.
> > > 
> > > Now we have a cgroup directory in the old cgroup, which the current
> > > task has been removed from, for which the current user has permissions
> > > and could then move the task back to.  Is that the essence of the
> > > problem?
> > 
> > That'd be one side.  The other side is the one moving.  Let's say the
> > system admin thing wants to move all processe from A proper to B.  It
> > would do that by draining processes from A's procs file into B's and
> > even that is multistep and can race.
> 
> Once freezer is ported, wouldn't that allow you to stop the processes so you
> can drain them? I understand your concern with draining, but surely the same
> races occur if you fork? How many times would you need to scan cgroup.procs
> to make sure that you didn't miss anything (and if there's enough processes
> then cgroup.procs reads aren't atomic either).

Sure, draining has to be iterative and in actual use cases freezing
would help but I think you're misunderstanding why I gave the example.
It wasn't meant to be "if you solve this problem case, we're all
good".  It was an illustration of the underlying problems where the
basic interface isn't fit for this sort of complex semantics.  Please
take a step back and look at the larger picture.  The current
interface is silly in many areas but it's still mostly integral and
I'd really like to keep at least that.

If there is a way to do this in a way which is not hacky, we can
definitely look into it.  However, given that the imminent problem is
relatively small, I'm not too sure that there would be a solution of
justifiable size.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-07-25 22:54 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-18 16:18 [PATCH v1 0/3] cgroup: allow for unprivileged management Aleksa Sarai
2016-07-18 16:18 ` [PATCH v1 1/3] kernfs: add support for custom per-sb permission hooks Aleksa Sarai
     [not found] ` <20160718161816.13040-1-asarai-l3A5Bk7waGM@public.gmane.org>
2016-07-18 16:18   ` [PATCH v1 2/3] cgroup: allow for unprivileged subtree management Aleksa Sarai
     [not found]     ` <20160718161816.13040-3-asarai-l3A5Bk7waGM@public.gmane.org>
2016-07-20 15:45       ` Tejun Heo
     [not found]         ` <20160720154557.GF4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-07-20 22:59           ` Aleksa Sarai
2016-07-18 16:18   ` [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants Aleksa Sarai
     [not found]     ` <20160718161816.13040-4-asarai-l3A5Bk7waGM@public.gmane.org>
2016-07-20 15:51       ` Tejun Heo
     [not found]         ` <20160720155147.GG4574-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-07-20 22:58           ` Aleksa Sarai
     [not found]             ` <6e975d80-4077-fb8b-ec84-708e37c8e149-l3A5Bk7waGM@public.gmane.org>
2016-07-20 23:02               ` Tejun Heo
     [not found]                 ` <20160720230228.GA19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2016-07-20 23:18                   ` Aleksa Sarai
     [not found]                     ` <982fcf3a-3685-9bd7-dd95-7bff255c9421-l3A5Bk7waGM@public.gmane.org>
2016-07-20 23:19                       ` Tejun Heo
     [not found]                         ` <20160720231949.GB19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2016-07-21  7:49                           ` Aleksa Sarai
     [not found]                             ` <379e5b13-29d4-ca75-1935-0a64f3db8d27-l3A5Bk7waGM@public.gmane.org>
2016-07-21 14:33                               ` Serge E. Hallyn
     [not found]                                 ` <20160721143330.GA5751-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-07-21 14:37                                   ` Aleksa Sarai
     [not found]                                     ` <2dc90947-cee7-90a9-3e60-4ca7c0de29d3-l3A5Bk7waGM@public.gmane.org>
2016-07-21 15:01                                       ` Tejun Heo
2016-07-21 15:09                                       ` Serge E. Hallyn
2016-07-21 14:51                                   ` James Bottomley
     [not found]                                     ` <1469112709.2331.11.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-07-21 14:59                                       ` Tejun Heo
     [not found]                                         ` <20160721145905.GC22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-07-21 15:07                                           ` Aleksa Sarai
2016-07-21 15:04                                             ` Tejun Heo
2016-07-21 14:52                               ` Tejun Heo
     [not found]                                 ` <20160721145242.GB22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-07-21 15:04                                   ` James Bottomley
     [not found]                                     ` <1469113456.2331.16.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-07-21 15:07                                       ` Tejun Heo
     [not found]                                         ` <20160721150740.GF22680-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-07-21 15:16                                           ` James Bottomley
2016-07-21 15:26                                             ` Tejun Heo
     [not found]                                               ` <20160721152648.GA23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-07-21 15:34                                                 ` James Bottomley
     [not found]                                                   ` <1469115276.2331.23.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-07-21 15:50                                                     ` Tejun Heo
     [not found]                                                       ` <20160721155046.GB23759-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-07-21 18:16                                                         ` James Bottomley
     [not found]                                                           ` <1469125002.2331.54.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2016-07-21 21:06                                                             ` Tejun Heo
2016-07-22  8:30                                                       ` Aleksa Sarai
     [not found]                                                         ` <177bbc17-5c75-1ff8-0b1f-0c5601fa7e6b-l3A5Bk7waGM@public.gmane.org>
2016-07-25 18:38                                                           ` Tejun Heo
     [not found]                                                             ` <20160725183801.GE19588-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2016-07-25 22:54                                                               ` Serge E. Hallyn
2016-07-22  8:24                                           ` Aleksa Sarai
2016-07-25 18:44                                             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).