cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found] ` <20120910223125.GC7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-10 22:33   ` Tejun Heo
       [not found]     ` <20120910223355.GD7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-10 22:33 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: Michal Hocko, Li Zefan, Glauber Costa, Peter Zijlstra,
	Paul Turner, Johannes Weiner, Thomas Graf, Serge E. Hallyn,
	Vivek Goyal, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Neil Horman, Aneesh Kumar K.V

(forgot cc'ing containers / cgroups mailing lists and used the old
 address for Li.  Reposting.  Sorry for the noise.)

Currently, cgroup hierarchy support is a mess.  cpu related subsystems
behave correctly - configuration, accounting and control on a parent
properly cover its children.  blkio and freezer completely ignore
hierarchy and treat all cgroups as if they're directly under the root
cgroup.  Others show yet different behaviors.

These differing interpretations of cgroup hierarchy make using cgroup
confusing and it impossible to co-mount controllers into the same
hierarchy and obtain sane behavior.

Eventually, we want full hierarchy support from all subsystems and
probably a unified hierarchy.  Users using separate hierarchies
expecting completely different behaviors depending on the mounted
subsystem is deterimental to making any progress on this front.

This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
for controllers which are lacking in hierarchy support.  The goal of
this patch is two-fold.

* Move users away from using hierarchy on currently non-hierarchical
  subsystems, so that implementing proper hierarchy support on those
  doesn't surprise them.

* Keep track of which controllers are broken how and nudge the
  subsystems to implement proper hierarchy support.

For now, start with a single warning message.  We can whine louder
later on.

(I tried to document what's broken and how it should be fixed.  If I
 got something wrong, please let me know.)

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Cc: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Arnaldo Carvalho de Melo <acme-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
Cc: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Cc: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 block/blk-cgroup.c        |    8 ++++++++
 include/linux/cgroup.h    |   15 +++++++++++++++
 kernel/cgroup.c           |   11 ++++++++++-
 kernel/cgroup_freezer.c   |    8 ++++++++
 kernel/events/core.c      |    7 +++++++
 mm/memcontrol.c           |   12 +++++++++---
 net/core/netprio_cgroup.c |   12 +++++++++++-
 net/sched/cls_cgroup.c    |    9 +++++++++
 security/device_cgroup.c  |    9 +++++++++
 9 files changed, 86 insertions(+), 5 deletions(-)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -737,6 +737,14 @@ struct cgroup_subsys blkio_subsys = {
 	.subsys_id = blkio_subsys_id,
 	.base_cftypes = blkcg_files,
 	.module = THIS_MODULE,
+
+	/*
+	 * blkio subsystem is utterly broken in terms of hierarchy support.
+	 * It treats all cgroups equally regardless of where they're
+	 * located in the hierarchy - all cgroups are treated as if they're
+	 * right below the root.  Fix it and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 EXPORT_SYMBOL_GPL(blkio_subsys);
 
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -503,6 +503,21 @@ struct cgroup_subsys {
 	 */
 	bool __DEPRECATED_clear_css_refs;
 
+	/*
+	 * If %false, this subsystem is properly hierarchical -
+	 * configuration, resource accounting and restriction on a parent
+	 * cgroup cover those of its children.  If %true, hierarchy support
+	 * is broken in some ways - some subsystems ignore hierarchy
+	 * completely while others are only implemented half-way.
+	 *
+	 * It's now diallowed to create nested cgroups if the subsystem is
+	 * broken and cgroup core will emit a warning message on such
+	 * cases.  Eventually, all subsystems will be made properly
+	 * hierarchical and this will go away.
+	 */
+	bool broken_hierarchy;
+	bool warned_broken_hierarchy;
+
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
 
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4075,8 +4075,17 @@ static long cgroup_create(struct cgroup 
 		set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
 
 	for_each_subsys(root, ss) {
-		struct cgroup_subsys_state *css = ss->create(cgrp);
+		struct cgroup_subsys_state *css;
+
+		if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
+		    parent->parent) {
+			pr_warning("cgroup: \"%s\" is not properly hierarchical yet, do not nest cgroups.\n",
+				   ss->name);
+			pr_warning("cgroup: \"memory\" requires setting use_hierarchy to 1 on the root.\n");
+			ss->warned_broken_hierarchy = true;
+		}
 
+		css = ss->create(cgrp);
 		if (IS_ERR(css)) {
 			err = PTR_ERR(css);
 			goto err_destroy;
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -373,4 +373,12 @@ struct cgroup_subsys freezer_subsys = {
 	.can_attach	= freezer_can_attach,
 	.fork		= freezer_fork,
 	.base_cftypes	= files,
+
+	/*
+	 * freezer subsys doesn't handle hierarchy at all.  Frozen state
+	 * should be inherited through the hierarchy - if a parent is
+	 * frozen, all its children should be frozen.  Fix it and remove
+	 * the following.
+	 */
+	.broken_hierarchy = true,
 };
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7285,5 +7285,12 @@ struct cgroup_subsys perf_subsys = {
 	.destroy	= perf_cgroup_destroy,
 	.exit		= perf_cgroup_exit,
 	.attach		= perf_cgroup_attach,
+
+	/*
+	 * perf_event cgroup doesn't handle nesting correctly.
+	 * ctx->nr_cgroups adjustments should be propagated through the
+	 * cgroup hierarchy.  Fix it and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 #endif /* CONFIG_CGROUP_PERF */
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3855,12 +3855,17 @@ static int mem_cgroup_hierarchy_write(st
 	 */
 	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
 				(val == 1 || val == 0)) {
-		if (list_empty(&cont->children))
+		if (list_empty(&cont->children)) {
 			memcg->use_hierarchy = val;
-		else
+			/* we're fully hierarchical iff root uses hierarchy */
+			if (mem_cgroup_is_root(memcg))
+				mem_cgroup_subsys.broken_hierarchy = !val;
+		} else {
 			retval = -EBUSY;
-	} else
+		}
+	} else {
 		retval = -EINVAL;
+	}
 
 out:
 	cgroup_unlock();
@@ -4953,6 +4958,7 @@ mem_cgroup_create(struct cgroup *cont)
 						&per_cpu(memcg_stock, cpu);
 			INIT_WORK(&stock->work, drain_local_stock);
 		}
+		mem_cgroup_subsys.broken_hierarchy = !memcg->use_hierarchy;
 		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -330,7 +330,17 @@ struct cgroup_subsys net_prio_subsys = {
 	.subsys_id	= net_prio_subsys_id,
 #endif
 	.base_cftypes	= ss_files,
-	.module		= THIS_MODULE
+	.module		= THIS_MODULE,
+
+	/*
+	 * net_prio has artificial limit on the number of cgroups and
+	 * disallows nesting making it impossible to co-mount it with other
+	 * hierarchical subsystems.  Remove the artificially low PRIOIDX_SZ
+	 * limit and properly nest configuration such that children follow
+	 * their parents' configurations by default and are allowed to
+	 * override and remove the following.
+	 */
+	.broken_hierarchy = trye,
 };
 
 static int netprio_device_event(struct notifier_block *unused,
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -82,6 +82,15 @@ struct cgroup_subsys net_cls_subsys = {
 #endif
 	.base_cftypes	= ss_files,
 	.module		= THIS_MODULE,
+
+	/*
+	 * While net_cls cgroup has the rudimentary hierarchy support of
+	 * inheriting the parent's classid on cgroup creation, it doesn't
+	 * properly propagates config changes in ancestors to their
+	 * descendents.  A child should follow the parent's configuration
+	 * but be allowed to override it.  Fix it and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 
 struct cls_cgroup_head {
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -457,6 +457,15 @@ struct cgroup_subsys devices_subsys = {
 	.destroy = devcgroup_destroy,
 	.subsys_id = devices_subsys_id,
 	.base_cftypes = dev_cgroup_files,
+
+	/*
+	 * While devices cgroup has the rudimentary hierarchy support which
+	 * checks the parent's restriction, it doesn't properly propagates
+	 * config changes in ancestors to their descendents.  A child
+	 * should only be allowed to add more restrictions to the parent's
+	 * configuration.  Fix it and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 
 int __devcgroup_inode_permission(struct inode *inode, int mask)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]     ` <20120910223355.GD7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-11 10:04       ` Michal Hocko
       [not found]         ` <20120911100433.GC8058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2012-09-11 12:38       ` Li Zefan
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2012-09-11 10:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

I like the approach in general but see the comments bellow:

On Mon 10-09-12 15:33:55, Tejun Heo wrote:
[...]
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3855,12 +3855,17 @@ static int mem_cgroup_hierarchy_write(st
>  	 */
>  	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
>  				(val == 1 || val == 0)) {
> -		if (list_empty(&cont->children))
> +		if (list_empty(&cont->children)) {
>  			memcg->use_hierarchy = val;
> -		else
> +			/* we're fully hierarchical iff root uses hierarchy */
> +			if (mem_cgroup_is_root(memcg))
> +				mem_cgroup_subsys.broken_hierarchy = !val;
> +		} else {
>  			retval = -EBUSY;
> -	} else
> +		}
> +	} else {
>  		retval = -EINVAL;
> +	}
>  
>  out:
>  	cgroup_unlock();
> @@ -4953,6 +4958,7 @@ mem_cgroup_create(struct cgroup *cont)
>  						&per_cpu(memcg_stock, cpu);
>  			INIT_WORK(&stock->work, drain_local_stock);
>  		}
> +		mem_cgroup_subsys.broken_hierarchy = !memcg->use_hierarchy;

Hmmm, this will warn even if we have
root (default use_hierarchy=0)
 \
  A (use_hierarchy=1)
   \
    B <- here

which is unfortunate because it will add a noise to a reasonable
configuration.
I think this is fixable if you move the warning after 
cgroup_subsys_state::create and broken_hierarchy would be set only if
parent is not root and use_hierarchy==0 in mem_cgroup_create. Something
like:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 795e525..d5c93ab 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4973,6 +4973,13 @@ mem_cgroup_create(struct cgroup *cont)
 	} else {
 		res_counter_init(&memcg->res, NULL);
 		res_counter_init(&memcg->memsw, NULL);
+		/*
+		 * Deeper hierachy with use_hierarchy == false doesn't make
+		 * much sense so let cgroup subsystem know about this unfortunate
+		 * state in our controller.
+		 */
+		if (parent && parent != root_mem_cgroup)
+			mem_cgroup_subsys.broken_hierarchy = true;
 	}
 	memcg->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&memcg->oom_notify);

What do you think?

>  		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -330,7 +330,17 @@ struct cgroup_subsys net_prio_subsys = {
>  	.subsys_id	= net_prio_subsys_id,
>  #endif
>  	.base_cftypes	= ss_files,
> -	.module		= THIS_MODULE
> +	.module		= THIS_MODULE,
> +
> +	/*
> +	 * net_prio has artificial limit on the number of cgroups and
> +	 * disallows nesting making it impossible to co-mount it with other
> +	 * hierarchical subsystems.  Remove the artificially low PRIOIDX_SZ
> +	 * limit and properly nest configuration such that children follow
> +	 * their parents' configurations by default and are allowed to
> +	 * override and remove the following.
> +	 */
> +	.broken_hierarchy = trye,

typo

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]     ` <20120910223355.GD7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-11 10:04       ` Michal Hocko
@ 2012-09-11 12:38       ` Li Zefan
       [not found]         ` <504F30DB.60808-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2012-09-11 18:23       ` [PATCH UPDATED " Tejun Heo
  2012-09-13 12:16       ` [PATCH REPOST " Daniel P. Berrange
  3 siblings, 1 reply; 28+ messages in thread
From: Li Zefan @ 2012-09-11 12:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Glauber Costa,
	Peter Zijlstra, Paul Turner, Johannes Weiner, Thomas Graf,
	Serge E. Hallyn, Vivek Goyal, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Neil Horman, Aneesh Kumar K.V

On 2012/9/11 6:33, Tejun Heo wrote:
> (forgot cc'ing containers / cgroups mailing lists and used the old
>  address for Li.  Reposting.  Sorry for the noise.)
> 
> Currently, cgroup hierarchy support is a mess.  cpu related subsystems
> behave correctly - configuration, accounting and control on a parent
> properly cover its children.  blkio and freezer completely ignore
> hierarchy and treat all cgroups as if they're directly under the root
> cgroup.  Others show yet different behaviors.
> 
> These differing interpretations of cgroup hierarchy make using cgroup
> confusing and it impossible to co-mount controllers into the same
> hierarchy and obtain sane behavior.
> 
> Eventually, we want full hierarchy support from all subsystems and
> probably a unified hierarchy.  Users using separate hierarchies
> expecting completely different behaviors depending on the mounted
> subsystem is deterimental to making any progress on this front.
> 
> This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
> for controllers which are lacking in hierarchy support.  The goal of
> this patch is two-fold.
> 
> * Move users away from using hierarchy on currently non-hierarchical
>   subsystems, so that implementing proper hierarchy support on those
>   doesn't surprise them.
> 
> * Keep track of which controllers are broken how and nudge the
>   subsystems to implement proper hierarchy support.
> 
> For now, start with a single warning message.  We can whine louder
> later on.
> 
> (I tried to document what's broken and how it should be fixed.  If I
>  got something wrong, please let me know.)
> 

So isn't cpuset broken too? child cpuset's cpu mask isn't necessary a subset
of the parent's if the cpu_exclusive flag is not set.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]         ` <20120911100433.GC8058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2012-09-11 17:07           ` Tejun Heo
       [not found]             ` <20120911170746.GL7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-12  9:31           ` Glauber Costa
  1 sibling, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-11 17:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Li Zefan, Glauber Costa,
	Peter Zijlstra, Paul Turner, Johannes Weiner, Thomas Graf,
	Serge E. Hallyn, Vivek Goyal, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Neil Horman, Aneesh Kumar K.V

Hello, Michal.

On Tue, Sep 11, 2012 at 12:04:33PM +0200, Michal Hocko wrote:
> >  	cgroup_unlock();
> > @@ -4953,6 +4958,7 @@ mem_cgroup_create(struct cgroup *cont)
> >  						&per_cpu(memcg_stock, cpu);
> >  			INIT_WORK(&stock->work, drain_local_stock);
> >  		}
> > +		mem_cgroup_subsys.broken_hierarchy = !memcg->use_hierarchy;
> 
> Hmmm, this will warn even if we have
> root (default use_hierarchy=0)
>  \
>   A (use_hierarchy=1)
>    \
>     B <- here
> 
> which is unfortunate because it will add a noise to a reasonable
> configuration.

I suppose you're talking about having root group not performing any
accounting and/or control?  I suppose such could be a valid use case
(is it really necessary tho?)  but I don't think .use_hierarchy is the
right interface for that.  If it's absolutely necessary, I think it
should be a root-only flag (even if that ends up using the same code
path).  Eventually, we really want to kill .use_hierarchy, or at least
make it to RO 1.  As it's currently defined, it's just way too
confusing.

> >  		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
> >  	} else {
> >  		parent = mem_cgroup_from_cont(cont->parent);
> > --- a/net/core/netprio_cgroup.c
> > +++ b/net/core/netprio_cgroup.c
> > @@ -330,7 +330,17 @@ struct cgroup_subsys net_prio_subsys = {
> >  	.subsys_id	= net_prio_subsys_id,
> >  #endif
> >  	.base_cftypes	= ss_files,
> > -	.module		= THIS_MODULE
> > +	.module		= THIS_MODULE,
> > +
> > +	/*
> > +	 * net_prio has artificial limit on the number of cgroups and
> > +	 * disallows nesting making it impossible to co-mount it with other
> > +	 * hierarchical subsystems.  Remove the artificially low PRIOIDX_SZ
> > +	 * limit and properly nest configuration such that children follow
> > +	 * their parents' configurations by default and are allowed to
> > +	 * override and remove the following.
> > +	 */
> > +	.broken_hierarchy = trye,
> 
> typo

Heh, I thought I enabled all controllers.  Thanks. :)

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]         ` <504F30DB.60808-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2012-09-11 17:08           ` Tejun Heo
       [not found]             ` <20120911170837.GM7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-12  9:34           ` Glauber Costa
  1 sibling, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-11 17:08 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Glauber Costa,
	Peter Zijlstra, Paul Turner, Johannes Weiner, Thomas Graf,
	Serge E. Hallyn, Vivek Goyal, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Neil Horman, Aneesh Kumar K.V

Hello, Li.

On Tue, Sep 11, 2012 at 08:38:51PM +0800, Li Zefan wrote:
> > (I tried to document what's broken and how it should be fixed.  If I
> >  got something wrong, please let me know.)
> > 
> 
> So isn't cpuset broken too? child cpuset's cpu mask isn't necessary a subset
> of the parent's if the cpu_exclusive flag is not set.

Heh, didn't even look at that.  Just assumed it was in the same boat
w/ cpu{acct}.  Will take a look.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]             ` <20120911170837.GM7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-11 17:43               ` Tejun Heo
       [not found]                 ` <20120911174319.GO7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-11 17:43 UTC (permalink / raw)
  To: Li Zefan
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Ingo Molnar,
	Paul Mackerras, Aneesh Kumar K.V, Arnaldo Carvalho de Melo,
	Johannes Weiner, Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Paul Turner, Vivek Goyal

Hello,

On Tue, Sep 11, 2012 at 10:08:37AM -0700, Tejun Heo wrote:
> > So isn't cpuset broken too? child cpuset's cpu mask isn't necessary a subset
> > of the parent's if the cpu_exclusive flag is not set.
> 
> Heh, didn't even look at that.  Just assumed it was in the same boat
> w/ cpu{acct}.  Will take a look.

Hmm... Looked at it and I don't think hierarchy support is broken w/
or w/o exclusive.  Can you please elaborate?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH UPDATED RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]     ` <20120910223355.GD7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-11 10:04       ` Michal Hocko
  2012-09-11 12:38       ` Li Zefan
@ 2012-09-11 18:23       ` Tejun Heo
       [not found]         ` <20120911182356.GR7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-13 12:16       ` [PATCH REPOST " Daniel P. Berrange
  3 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-11 18:23 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: Michal Hocko, Li Zefan, Glauber Costa, Peter Zijlstra,
	Paul Turner, Johannes Weiner, Thomas Graf, Serge E. Hallyn,
	Vivek Goyal, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Neil Horman, Aneesh Kumar K.V

Currently, cgroup hierarchy support is a mess.  cpu related subsystems
behave correctly - configuration, accounting and control on a parent
properly cover its children.  blkio and freezer completely ignore
hierarchy and treat all cgroups as if they're directly under the root
cgroup.  Others show yet different behaviors.

These differing interpretations of cgroup hierarchy make using cgroup
confusing and it impossible to co-mount controllers into the same
hierarchy and obtain sane behavior.

Eventually, we want full hierarchy support from all subsystems and
probably a unified hierarchy.  Users using separate hierarchies
expecting completely different behaviors depending on the mounted
subsystem is deterimental to making any progress on this front.

This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
for controllers which are lacking in hierarchy support.  The goal of
this patch is two-fold.

* Move users away from using hierarchy on currently non-hierarchical
  subsystems, so that implementing proper hierarchy support on those
  doesn't surprise them.

* Keep track of which controllers are broken how and nudge the
  subsystems to implement proper hierarchy support.

For now, start with a single warning message.  We can whine louder
later on.

(I tried to document what's broken and how it should be fixed.  If I
 got something wrong, please let me know.)

v2: Fixed a typo spotted by Michal.  Warning message updated.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Cc: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Arnaldo Carvalho de Melo <acme-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
Cc: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Cc: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
Here's a slightly updated version w/ updated warning message and typo
fixed.

Thanks.

 block/blk-cgroup.c        |    8 ++++++++
 include/linux/cgroup.h    |   15 +++++++++++++++
 kernel/cgroup.c           |   12 +++++++++++-
 kernel/cgroup_freezer.c   |    8 ++++++++
 kernel/events/core.c      |    7 +++++++
 mm/memcontrol.c           |   12 +++++++++---
 net/core/netprio_cgroup.c |   12 +++++++++++-
 net/sched/cls_cgroup.c    |    9 +++++++++
 security/device_cgroup.c  |    9 +++++++++
 9 files changed, 87 insertions(+), 5 deletions(-)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -737,6 +737,14 @@ struct cgroup_subsys blkio_subsys = {
 	.subsys_id = blkio_subsys_id,
 	.base_cftypes = blkcg_files,
 	.module = THIS_MODULE,
+
+	/*
+	 * blkio subsystem is utterly broken in terms of hierarchy support.
+	 * It treats all cgroups equally regardless of where they're
+	 * located in the hierarchy - all cgroups are treated as if they're
+	 * right below the root.  Fix it and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 EXPORT_SYMBOL_GPL(blkio_subsys);
 
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -503,6 +503,21 @@ struct cgroup_subsys {
 	 */
 	bool __DEPRECATED_clear_css_refs;
 
+	/*
+	 * If %false, this subsystem is properly hierarchical -
+	 * configuration, resource accounting and restriction on a parent
+	 * cgroup cover those of its children.  If %true, hierarchy support
+	 * is broken in some ways - some subsystems ignore hierarchy
+	 * completely while others are only implemented half-way.
+	 *
+	 * It's now diallowed to create nested cgroups if the subsystem is
+	 * broken and cgroup core will emit a warning message on such
+	 * cases.  Eventually, all subsystems will be made properly
+	 * hierarchical and this will go away.
+	 */
+	bool broken_hierarchy;
+	bool warned_broken_hierarchy;
+
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
 
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4075,8 +4075,18 @@ static long cgroup_create(struct cgroup 
 		set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
 
 	for_each_subsys(root, ss) {
-		struct cgroup_subsys_state *css = ss->create(cgrp);
+		struct cgroup_subsys_state *css;
+
+		if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
+		    parent->parent) {
+			pr_warning("cgroup: %s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
+				   current->comm, current->pid, ss->name);
+			if (!strcmp(ss->name, "memory"))
+				pr_warning("cgroup: \"memory\" requires setting use_hierarchy to 1 on the root.\n");
+			ss->warned_broken_hierarchy = true;
+		}
 
+		css = ss->create(cgrp);
 		if (IS_ERR(css)) {
 			err = PTR_ERR(css);
 			goto err_destroy;
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -373,4 +373,12 @@ struct cgroup_subsys freezer_subsys = {
 	.can_attach	= freezer_can_attach,
 	.fork		= freezer_fork,
 	.base_cftypes	= files,
+
+	/*
+	 * freezer subsys doesn't handle hierarchy at all.  Frozen state
+	 * should be inherited through the hierarchy - if a parent is
+	 * frozen, all its children should be frozen.  Fix it and remove
+	 * the following.
+	 */
+	.broken_hierarchy = true,
 };
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7285,5 +7285,12 @@ struct cgroup_subsys perf_subsys = {
 	.destroy	= perf_cgroup_destroy,
 	.exit		= perf_cgroup_exit,
 	.attach		= perf_cgroup_attach,
+
+	/*
+	 * perf_event cgroup doesn't handle nesting correctly.
+	 * ctx->nr_cgroups adjustments should be propagated through the
+	 * cgroup hierarchy.  Fix it and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 #endif /* CONFIG_CGROUP_PERF */
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3855,12 +3855,17 @@ static int mem_cgroup_hierarchy_write(st
 	 */
 	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
 				(val == 1 || val == 0)) {
-		if (list_empty(&cont->children))
+		if (list_empty(&cont->children)) {
 			memcg->use_hierarchy = val;
-		else
+			/* we're fully hierarchical iff root uses hierarchy */
+			if (mem_cgroup_is_root(memcg))
+				mem_cgroup_subsys.broken_hierarchy = !val;
+		} else {
 			retval = -EBUSY;
-	} else
+		}
+	} else {
 		retval = -EINVAL;
+	}
 
 out:
 	cgroup_unlock();
@@ -4953,6 +4958,7 @@ mem_cgroup_create(struct cgroup *cont)
 						&per_cpu(memcg_stock, cpu);
 			INIT_WORK(&stock->work, drain_local_stock);
 		}
+		mem_cgroup_subsys.broken_hierarchy = !memcg->use_hierarchy;
 		hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -330,7 +330,17 @@ struct cgroup_subsys net_prio_subsys = {
 	.subsys_id	= net_prio_subsys_id,
 #endif
 	.base_cftypes	= ss_files,
-	.module		= THIS_MODULE
+	.module		= THIS_MODULE,
+
+	/*
+	 * net_prio has artificial limit on the number of cgroups and
+	 * disallows nesting making it impossible to co-mount it with other
+	 * hierarchical subsystems.  Remove the artificially low PRIOIDX_SZ
+	 * limit and properly nest configuration such that children follow
+	 * their parents' configurations by default and are allowed to
+	 * override and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 
 static int netprio_device_event(struct notifier_block *unused,
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -82,6 +82,15 @@ struct cgroup_subsys net_cls_subsys = {
 #endif
 	.base_cftypes	= ss_files,
 	.module		= THIS_MODULE,
+
+	/*
+	 * While net_cls cgroup has the rudimentary hierarchy support of
+	 * inheriting the parent's classid on cgroup creation, it doesn't
+	 * properly propagates config changes in ancestors to their
+	 * descendents.  A child should follow the parent's configuration
+	 * but be allowed to override it.  Fix it and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 
 struct cls_cgroup_head {
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -457,6 +457,15 @@ struct cgroup_subsys devices_subsys = {
 	.destroy = devcgroup_destroy,
 	.subsys_id = devices_subsys_id,
 	.base_cftypes = dev_cgroup_files,
+
+	/*
+	 * While devices cgroup has the rudimentary hierarchy support which
+	 * checks the parent's restriction, it doesn't properly propagates
+	 * config changes in ancestors to their descendents.  A child
+	 * should only be allowed to add more restrictions to the parent's
+	 * configuration.  Fix it and remove the following.
+	 */
+	.broken_hierarchy = true,
 };
 
 int __devcgroup_inode_permission(struct inode *inode, int mask)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH UPDATED RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]         ` <20120911182356.GR7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-11 20:50           ` Aristeu Rozanski
       [not found]             ` <20120911205027.GA25837-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Aristeu Rozanski @ 2012-09-11 20:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

Hi Tejun,
On Tue, Sep 11, 2012 at 11:23:56AM -0700, Tejun Heo wrote:
> +			/* we're fully hierarchical iff root uses hierarchy */

minor nit: s/iff/if/

-- 
Aristeu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH UPDATED RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]             ` <20120911205027.GA25837-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-09-11 20:51               ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2012-09-11 20:51 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Li Zefan,
	Glauber Costa, Peter Zijlstra, Paul Turner, Johannes Weiner,
	Thomas Graf, Serge E. Hallyn, Vivek Goyal, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, Neil Horman,
	Aneesh Kumar K.V

Hello,

On Tue, Sep 11, 2012 at 1:50 PM, Aristeu Rozanski <aris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Hi Tejun,
> On Tue, Sep 11, 2012 at 11:23:56AM -0700, Tejun Heo wrote:
>> +                     /* we're fully hierarchical iff root uses hierarchy */
>
> minor nit: s/iff/if/

That was intentional iff == if and only if. :)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]         ` <20120911100433.GC8058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2012-09-11 17:07           ` Tejun Heo
@ 2012-09-12  9:31           ` Glauber Costa
       [not found]             ` <5050568B.9090601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  1 sibling, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2012-09-12  9:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Graf, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On 09/11/2012 02:04 PM, Michal Hocko wrote:
> I like the approach in general but see the comments bellow:
> 
> On Mon 10-09-12 15:33:55, Tejun Heo wrote:
> [...]
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3855,12 +3855,17 @@ static int mem_cgroup_hierarchy_write(st
>>  	 */
>>  	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
>>  				(val == 1 || val == 0)) {
>> -		if (list_empty(&cont->children))
>> +		if (list_empty(&cont->children)) {
>>  			memcg->use_hierarchy = val;
>> -		else
>> +			/* we're fully hierarchical iff root uses hierarchy */
>> +			if (mem_cgroup_is_root(memcg))
>> +				mem_cgroup_subsys.broken_hierarchy = !val;
>> +		} else {
>>  			retval = -EBUSY;
>> -	} else
>> +		}
>> +	} else {
>>  		retval = -EINVAL;
>> +	}
>>  
>>  out:
>>  	cgroup_unlock();
>> @@ -4953,6 +4958,7 @@ mem_cgroup_create(struct cgroup *cont)
>>  						&per_cpu(memcg_stock, cpu);
>>  			INIT_WORK(&stock->work, drain_local_stock);
>>  		}
>> +		mem_cgroup_subsys.broken_hierarchy = !memcg->use_hierarchy;
> 
> Hmmm, this will warn even if we have
> root (default use_hierarchy=0)
>  \
>   A (use_hierarchy=1)
>    \
>     B <- here
> 
> which is unfortunate because it will add a noise to a reasonable
> configuration.
> I think this is fixable if you move the warning after 
> cgroup_subsys_state::create and broken_hierarchy would be set only if
> parent is not root and use_hierarchy==0 in mem_cgroup_create. Something
> like:
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 795e525..d5c93ab 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4973,6 +4973,13 @@ mem_cgroup_create(struct cgroup *cont)
>  	} else {
>  		res_counter_init(&memcg->res, NULL);
>  		res_counter_init(&memcg->memsw, NULL);
> +		/*
> +		 * Deeper hierachy with use_hierarchy == false doesn't make
> +		 * much sense so let cgroup subsystem know about this unfortunate
> +		 * state in our controller.
> +		 */
> +		if (parent && parent != root_mem_cgroup)
> +			mem_cgroup_subsys.broken_hierarchy = true;
>  	}
>  	memcg->last_scanned_node = MAX_NUMNODES;
>  	INIT_LIST_HEAD(&memcg->oom_notify);
> 
> What do you think?
> 
I side with Tejun's original intentions.

While I respect your goal of not warning about any configuration
with max_level = 1, I believe the only sane configuration as soon
as we get any 2nd-level child is use_hierarchy = 1 for everybody.

Everything aside from it should be warned.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]         ` <504F30DB.60808-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2012-09-11 17:08           ` Tejun Heo
@ 2012-09-12  9:34           ` Glauber Costa
  1 sibling, 0 replies; 28+ messages in thread
From: Glauber Costa @ 2012-09-12  9:34 UTC (permalink / raw)
  To: Li Zefan
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Thomas Graf,
	Ingo Molnar, Paul Mackerras, Aneesh Kumar K.V,
	Arnaldo Carvalho de Melo, Johannes Weiner, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Vivek Goyal

On 09/11/2012 04:38 PM, Li Zefan wrote:
> On 2012/9/11 6:33, Tejun Heo wrote:
>> (forgot cc'ing containers / cgroups mailing lists and used the old
>>  address for Li.  Reposting.  Sorry for the noise.)
>>
>> Currently, cgroup hierarchy support is a mess.  cpu related subsystems
>> behave correctly - configuration, accounting and control on a parent
>> properly cover its children.  blkio and freezer completely ignore
>> hierarchy and treat all cgroups as if they're directly under the root
>> cgroup.  Others show yet different behaviors.
>>
>> These differing interpretations of cgroup hierarchy make using cgroup
>> confusing and it impossible to co-mount controllers into the same
>> hierarchy and obtain sane behavior.
>>
>> Eventually, we want full hierarchy support from all subsystems and
>> probably a unified hierarchy.  Users using separate hierarchies
>> expecting completely different behaviors depending on the mounted
>> subsystem is deterimental to making any progress on this front.
>>
>> This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
>> for controllers which are lacking in hierarchy support.  The goal of
>> this patch is two-fold.
>>
>> * Move users away from using hierarchy on currently non-hierarchical
>>   subsystems, so that implementing proper hierarchy support on those
>>   doesn't surprise them.
>>
>> * Keep track of which controllers are broken how and nudge the
>>   subsystems to implement proper hierarchy support.
>>
>> For now, start with a single warning message.  We can whine louder
>> later on.
>>
>> (I tried to document what's broken and how it should be fixed.  If I
>>  got something wrong, please let me know.)
>>
> 
> So isn't cpuset broken too? child cpuset's cpu mask isn't necessary a subset
> of the parent's if the cpu_exclusive flag is not set.
> 

Li is right, cpu_exclusive is pretty much cpuset's version of memcg's
use_hierarchy.

the only problem is that it seem to be by design, unlike memcg's, which
was there to bypass a performance problem to the best of my knowledge.
Even then, we should maybe think about start enforcing that flag as well.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                 ` <20120911174319.GO7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-12  9:37                   ` Glauber Costa
       [not found]                     ` <505057D8.4010908-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2012-09-12  9:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On 09/11/2012 09:43 PM, Tejun Heo wrote:
> Hmm... Looked at it and I don't think hierarchy support is broken w/
> or w/o exclusive.  Can you please elaborate?
> 
> Thanks.

From the docs:

"If a cpuset is cpu or mem exclusive, no other cpuset, other than
a direct ancestor or descendant, may share any of the same CPUs or
Memory Nodes."

So I think it tricked me as well. I was under the impression that
"exclusive" would also disallow the kids.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]             ` <20120911170746.GL7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-12 15:47               ` Michal Hocko
       [not found]                 ` <20120912154745.GV21579-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2012-09-12 15:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Li Zefan, Glauber Costa,
	Peter Zijlstra, Paul Turner, Johannes Weiner, Thomas Graf,
	Serge E. Hallyn, Vivek Goyal, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Neil Horman, Aneesh Kumar K.V

On Tue 11-09-12 10:07:46, Tejun Heo wrote:
> Hello, Michal.
> 
> On Tue, Sep 11, 2012 at 12:04:33PM +0200, Michal Hocko wrote:
> > >  	cgroup_unlock();
> > > @@ -4953,6 +4958,7 @@ mem_cgroup_create(struct cgroup *cont)
> > >  						&per_cpu(memcg_stock, cpu);
> > >  			INIT_WORK(&stock->work, drain_local_stock);
> > >  		}
> > > +		mem_cgroup_subsys.broken_hierarchy = !memcg->use_hierarchy;
> > 
> > Hmmm, this will warn even if we have
> > root (default use_hierarchy=0)
> >  \
> >   A (use_hierarchy=1)
> >    \
> >     B <- here
> > 
> > which is unfortunate because it will add a noise to a reasonable
> > configuration.
> 
> I suppose you're talking about having root group not performing any
> accounting and/or control?

It doesn't do any controlling because you cannot set any limit for it.
Root cgroup has always been special.

> I suppose such could be a valid use case
> (is it really necessary tho?)  but I don't think .use_hierarchy is the
> right interface for that.  

I am not sure I understand what you mean by that. My only concern with
is that we shouldn't complain if somebody doesn't do anything wrong.
And creating a group under root without any other children, no matter
what use_hierarchy says, is a valid use case and we shouldn't make too
much noise about that.
The only difference in such a use case is that hierarchical stats will
include numbers from the group only if root had use_hierarchy==1. There
are no other side effects.

> If it's absolutely necessary, I think it should be a root-only flag
> (even if that ends up using the same code path).  Eventually, we
> really want to kill .use_hierarchy, or at least make it to RO 1.  As
> it's currently defined, it's just way too confusing.

Agreed on that, definitely.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]             ` <5050568B.9090601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-12 15:49               ` Michal Hocko
       [not found]                 ` <20120912154907.GW21579-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2012-09-12 15:49 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Graf, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On Wed 12-09-12 13:31:55, Glauber Costa wrote:
> On 09/11/2012 02:04 PM, Michal Hocko wrote:
> > I like the approach in general but see the comments bellow:
> > 
> > On Mon 10-09-12 15:33:55, Tejun Heo wrote:
> > [...]
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> >> @@ -3855,12 +3855,17 @@ static int mem_cgroup_hierarchy_write(st
> >>  	 */
> >>  	if ((!parent_memcg || !parent_memcg->use_hierarchy) &&
> >>  				(val == 1 || val == 0)) {
> >> -		if (list_empty(&cont->children))
> >> +		if (list_empty(&cont->children)) {
> >>  			memcg->use_hierarchy = val;
> >> -		else
> >> +			/* we're fully hierarchical iff root uses hierarchy */
> >> +			if (mem_cgroup_is_root(memcg))
> >> +				mem_cgroup_subsys.broken_hierarchy = !val;
> >> +		} else {
> >>  			retval = -EBUSY;
> >> -	} else
> >> +		}
> >> +	} else {
> >>  		retval = -EINVAL;
> >> +	}
> >>  
> >>  out:
> >>  	cgroup_unlock();
> >> @@ -4953,6 +4958,7 @@ mem_cgroup_create(struct cgroup *cont)
> >>  						&per_cpu(memcg_stock, cpu);
> >>  			INIT_WORK(&stock->work, drain_local_stock);
> >>  		}
> >> +		mem_cgroup_subsys.broken_hierarchy = !memcg->use_hierarchy;
> > 
> > Hmmm, this will warn even if we have
> > root (default use_hierarchy=0)
> >  \
> >   A (use_hierarchy=1)
> >    \
> >     B <- here
> > 
> > which is unfortunate because it will add a noise to a reasonable
> > configuration.
> > I think this is fixable if you move the warning after 
> > cgroup_subsys_state::create and broken_hierarchy would be set only if
> > parent is not root and use_hierarchy==0 in mem_cgroup_create. Something
> > like:
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 795e525..d5c93ab 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -4973,6 +4973,13 @@ mem_cgroup_create(struct cgroup *cont)
> >  	} else {
> >  		res_counter_init(&memcg->res, NULL);
> >  		res_counter_init(&memcg->memsw, NULL);
> > +		/*
> > +		 * Deeper hierachy with use_hierarchy == false doesn't make
> > +		 * much sense so let cgroup subsystem know about this unfortunate
> > +		 * state in our controller.
> > +		 */
> > +		if (parent && parent != root_mem_cgroup)
> > +			mem_cgroup_subsys.broken_hierarchy = true;
> >  	}
> >  	memcg->last_scanned_node = MAX_NUMNODES;
> >  	INIT_LIST_HEAD(&memcg->oom_notify);
> > 
> > What do you think?
> > 
> I side with Tejun's original intentions.
> 
> While I respect your goal of not warning about any configuration
> with max_level = 1, I believe the only sane configuration as soon
> as we get any 2nd-level child is use_hierarchy = 1 for everybody.
> 
> Everything aside from it should be warned.

Defintely. And that what the above guarantess, doesn't it?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                     ` <505057D8.4010908-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-12 16:34                       ` Tejun Heo
       [not found]                         ` <20120912163433.GL7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-12 16:34 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

Hello,

On Wed, Sep 12, 2012 at 01:37:28PM +0400, Glauber Costa wrote:
> "If a cpuset is cpu or mem exclusive, no other cpuset, other than
> a direct ancestor or descendant, may share any of the same CPUs or
> Memory Nodes."
> 
> So I think it tricked me as well. I was under the impression that
> "exclusive" would also disallow the kids.

You two are confusing me even more.  AFAICS, the hierarchical
properties don't seem to change whether exclusive is set or not.  It
still ensures children can't have something parent doesn't allow and
exclusive applies to whether to share something with siblings, so I
don't think anything is broken hierarchy-wise.  Am I missing
something?  If so, please be explicit and elaborate where and how it's
broken.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                 ` <20120912154745.GV21579-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2012-09-12 16:41                   ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2012-09-12 16:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

Hello,

On Wed, Sep 12, 2012 at 05:47:45PM +0200, Michal Hocko wrote:
> > If it's absolutely necessary, I think it should be a root-only flag
> > (even if that ends up using the same code path).  Eventually, we
> > really want to kill .use_hierarchy, or at least make it to RO 1.  As
> > it's currently defined, it's just way too confusing.
> 
> Agreed on that, definitely.

The thing is that if we're gonna be headed that way, we're gonna have
to nudge people to .use_hierarchy = 1 everywher first.  If
.use_hierarchy = 0 at root is a valid use case which absolutedly needs
to be maintained (from what I read, I don't think it is), let's move
it to a different root-only flag; otherwise, it should be phased out.
As such, while the user might not be necessarily wrong, we still need
to unify the behavior, I think.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                 ` <20120912154907.GW21579-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2012-09-12 17:11                   ` Tejun Heo
       [not found]                     ` <20120912171120.GP7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-12 17:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

Hello,

On Wed, Sep 12, 2012 at 05:49:07PM +0200, Michal Hocko wrote:
> > While I respect your goal of not warning about any configuration
> > with max_level = 1, I believe the only sane configuration as soon
> > as we get any 2nd-level child is use_hierarchy = 1 for everybody.
> > 
> > Everything aside from it should be warned.
> 
> Defintely. And that what the above guarantess, doesn't it?

I'm getting a bit worried that I might not be fully understanding what
your concern is.  Can you please elaborate what your worries are and
the transition plan that you have in your mind regarding
.use_hierarchy?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                         ` <20120912163433.GL7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-13  6:48                           ` Li Zefan
  0 siblings, 0 replies; 28+ messages in thread
From: Li Zefan @ 2012-09-13  6:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Ingo Molnar,
	Paul Mackerras, Aneesh Kumar K.V, Arnaldo Carvalho de Melo,
	Johannes Weiner, Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Paul Turner, Vivek Goyal

On 2012/9/13 0:34, Tejun Heo wrote:
> Hello,
> 
> On Wed, Sep 12, 2012 at 01:37:28PM +0400, Glauber Costa wrote:
>> "If a cpuset is cpu or mem exclusive, no other cpuset, other than
>> a direct ancestor or descendant, may share any of the same CPUs or
>> Memory Nodes."
>>
>> So I think it tricked me as well. I was under the impression that
>> "exclusive" would also disallow the kids.
> 
> You two are confusing me even more.  AFAICS, the hierarchical
> properties don't seem to change whether exclusive is set or not.  It
> still ensures children can't have something parent doesn't allow and
> exclusive applies to whether to share something with siblings, so I
> don't think anything is broken hierarchy-wise.  Am I missing
> something?  If so, please be explicit and elaborate where and how it's
> broken.
> 

Ignore it. I misunderstood the exclusive flag. Sorry for the noise.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                     ` <20120912171120.GP7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-13 12:01                       ` Glauber Costa
       [not found]                         ` <5051CB24.4010801-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2012-09-13 12:14                       ` Michal Hocko
  1 sibling, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2012-09-13 12:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On 09/12/2012 09:11 PM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Sep 12, 2012 at 05:49:07PM +0200, Michal Hocko wrote:
>>> While I respect your goal of not warning about any configuration
>>> with max_level = 1, I believe the only sane configuration as soon
>>> as we get any 2nd-level child is use_hierarchy = 1 for everybody.
>>>
>>> Everything aside from it should be warned.
>>
>> Defintely. And that what the above guarantess, doesn't it?
> 
> I'm getting a bit worried that I might not be fully understanding what
> your concern is.  Can you please elaborate what your worries are and
> the transition plan that you have in your mind regarding
> .use_hierarchy?
> 

This is getting confusing for me as well, because I don't know if your
reply was targeted towards me or Michal. As for me, I am in agreement
with what you did, and I merely replied to Michal's concern and
suggestion of not warning in the special 1-st level only setups saying I
side with you.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                     ` <20120912171120.GP7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-13 12:01                       ` Glauber Costa
@ 2012-09-13 12:14                       ` Michal Hocko
       [not found]                         ` <20120913121438.GC8055-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  1 sibling, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2012-09-13 12:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On Wed 12-09-12 10:11:20, Tejun Heo wrote:
> Hello,
> 
> On Wed, Sep 12, 2012 at 05:49:07PM +0200, Michal Hocko wrote:
> > > While I respect your goal of not warning about any configuration
> > > with max_level = 1, I believe the only sane configuration as soon
> > > as we get any 2nd-level child is use_hierarchy = 1 for everybody.
> > > 
> > > Everything aside from it should be warned.
> > 
> > Defintely. And that what the above guarantess, doesn't it?
> 
> I'm getting a bit worried that I might not be fully understanding what
> your concern is.  Can you please elaborate what your worries are and
> the transition plan that you have in your mind regarding
> .use_hierarchy?

I would like to see use_hierarchy go away. The only concern I have is 
to warn only if somebody is doing something wrong (aka flat
hierarchies). Or better put it this way. Do not warn in cases which do
not change if use_hierarchy is gone or default changes to 1.
An example:
root (use_hierarchy=0)
 | \
 |  A (use_hierarchy=0)
 |
 B (use_hierarachy=1)
 |\
 C D

is a perfectly sane configuration and I do not see any reason to fill
logs with some scary warnings when A is created. There will be no
semantical change in this setup When use_hierchy is gone.

So the only thing I am proposing here is to warn only if something
should be fixed in the configuration in order to be prepared for fully
hierarchical (and that is a second level of children from root with
use_hierachy==0).

Does it make more sense now?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]     ` <20120910223355.GD7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
                         ` (2 preceding siblings ...)
  2012-09-11 18:23       ` [PATCH UPDATED " Tejun Heo
@ 2012-09-13 12:16       ` Daniel P. Berrange
       [not found]         ` <20120913121629.GL7767-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  3 siblings, 1 reply; 28+ messages in thread
From: Daniel P. Berrange @ 2012-09-13 12:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Ingo Molnar,
	Paul Mackerras, Aneesh Kumar K.V, Arnaldo Carvalho de Melo,
	Johannes Weiner, Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Paul Turner, Vivek Goyal

On Mon, Sep 10, 2012 at 03:33:55PM -0700, Tejun Heo wrote:
> (forgot cc'ing containers / cgroups mailing lists and used the old
>  address for Li.  Reposting.  Sorry for the noise.)
> 
> Currently, cgroup hierarchy support is a mess.  cpu related subsystems
> behave correctly - configuration, accounting and control on a parent
> properly cover its children.  blkio and freezer completely ignore
> hierarchy and treat all cgroups as if they're directly under the root
> cgroup.  Others show yet different behaviors.
> 
> These differing interpretations of cgroup hierarchy make using cgroup
> confusing and it impossible to co-mount controllers into the same
> hierarchy and obtain sane behavior.
> 
> Eventually, we want full hierarchy support from all subsystems and
> probably a unified hierarchy.  Users using separate hierarchies
> expecting completely different behaviors depending on the mounted
> subsystem is deterimental to making any progress on this front.
> 
> This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
> for controllers which are lacking in hierarchy support.  The goal of
> this patch is two-fold.
> 
> * Move users away from using hierarchy on currently non-hierarchical
>   subsystems, so that implementing proper hierarchy support on those
>   doesn't surprise them.
> 
> * Keep track of which controllers are broken how and nudge the
>   subsystems to implement proper hierarchy support.
> 
> For now, start with a single warning message.  We can whine louder
> later on.

If you want application developers / users to understand this, then
you really need to update the Documentation/cgroups/cgroups.txt file
to provide suitable recommendations on use of hierarchies. As it
stands today it arguably encourages apps to make use of arbitrary
hierarchies. While some of the other docs (blkio-controller.txt) do
describe the limitations of their implementation, they don't ever
go as far as recommending against usage of hierarchies, which is
what you seem to be saying.

Just printing warning messages in the logs with no docs to explain
the issues which cause the warnings is not friendly to users, and
IMHO will just lead people to ignore the warnings.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                         ` <20120913121438.GC8055-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2012-09-13 17:18                           ` Tejun Heo
       [not found]                             ` <20120913171832.GY7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-13 17:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

Hello, Michal.

On Thu, Sep 13, 2012 at 02:14:38PM +0200, Michal Hocko wrote:
> I would like to see use_hierarchy go away. The only concern I have is 
> to warn only if somebody is doing something wrong (aka flat
> hierarchies). Or better put it this way. Do not warn in cases which do
> not change if use_hierarchy is gone or default changes to 1.
> An example:
> root (use_hierarchy=0)
>  | \
>  |  A (use_hierarchy=0)
>  |
>  B (use_hierarachy=1)
>  |\
>  C D
> 
> is a perfectly sane configuration and I do not see any reason to fill
> logs with some scary warnings when A is created. There will be no
> semantical change in this setup When use_hierchy is gone.
> 
> So the only thing I am proposing here is to warn only if something
> should be fixed in the configuration in order to be prepared for fully
> hierarchical (and that is a second level of children from root with
> use_hierachy==0).
> 
> Does it make more sense now?

Ah, okay, so what you're saying is that we shouldn't warn if 0
.use_hierarchys don't make any behavior difference from when they're
all 1, right?  If so, I have no objection.  Will incorporate your
updated version.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                         ` <5051CB24.4010801-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-13 17:21                           ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2012-09-13 17:21 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

Hello, Glauber.

On Thu, Sep 13, 2012 at 04:01:40PM +0400, Glauber Costa wrote:
> This is getting confusing for me as well, because I don't know if your
> reply was targeted towards me or Michal. As for me, I am in agreement
> with what you did, and I merely replied to Michal's concern and
> suggestion of not warning in the special 1-st level only setups saying I
> side with you.

Oh, I was replying to Michal.  I thought there was behavior difference
with the .use_hierarchy=0 cases that Michal was talking about but I
don't think that's the case and Michal is trying to say that we
shouldn't warn if the configuration behaves identical to
root.use_hierarchy=1 even if it contains some .use_hierarchy=0's,
which I'm fine with.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                             ` <20120913171832.GY7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-13 17:39                               ` Michal Hocko
       [not found]                                 ` <20120913173958.GA21381-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2012-09-13 17:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On Thu 13-09-12 10:18:32, Tejun Heo wrote:
> Hello, Michal.
> 
> On Thu, Sep 13, 2012 at 02:14:38PM +0200, Michal Hocko wrote:
> > I would like to see use_hierarchy go away. The only concern I have is 
> > to warn only if somebody is doing something wrong (aka flat
> > hierarchies). Or better put it this way. Do not warn in cases which do
> > not change if use_hierarchy is gone or default changes to 1.
> > An example:
> > root (use_hierarchy=0)
> >  | \
> >  |  A (use_hierarchy=0)
> >  |
> >  B (use_hierarachy=1)
> >  |\
> >  C D
> > 
> > is a perfectly sane configuration and I do not see any reason to fill
> > logs with some scary warnings when A is created. There will be no
> > semantical change in this setup When use_hierchy is gone.
> > 
> > So the only thing I am proposing here is to warn only if something
> > should be fixed in the configuration in order to be prepared for fully
> > hierarchical (and that is a second level of children from root with
> > use_hierachy==0).
> > 
> > Does it make more sense now?
> 
> Ah, okay, so what you're saying is that we shouldn't warn if 0
> .use_hierarchys don't make any behavior difference from when they're
> all 1, right?  

Exactly. 1st level of children under the root is exactly this kind of
setup.

> If so, I have no objection.  Will incorporate your updated version.

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]         ` <20120913121629.GL7767-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-09-13 17:52           ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2012-09-13 17:52 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Neil Horman, Michal Hocko,
	Paul Mackerras, Aneesh Kumar K.V, Arnaldo Carvalho de Melo,
	Johannes Weiner, Thomas Graf, Serge E. Hallyn, Paul Turner,
	Ingo Molnar, Vivek Goyal

Hello, Daniel.

On Thu, Sep 13, 2012 at 01:16:29PM +0100, Daniel P. Berrange wrote:
> If you want application developers / users to understand this, then
> you really need to update the Documentation/cgroups/cgroups.txt file
> to provide suitable recommendations on use of hierarchies. As it
> stands today it arguably encourages apps to make use of arbitrary
> hierarchies. While some of the other docs (blkio-controller.txt) do
> describe the limitations of their implementation, they don't ever
> go as far as recommending against usage of hierarchies, which is
> what you seem to be saying.
>
> Just printing warning messages in the logs with no docs to explain
> the issues which cause the warnings is not friendly to users, and
> IMHO will just lead people to ignore the warnings.

Oh yeah, we definitely need to update the docs.  No doubt about it.
Will do.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                                 ` <20120913173958.GA21381-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2012-09-14  8:19                                   ` Glauber Costa
       [not found]                                     ` <5052E87A.1050405-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Glauber Costa @ 2012-09-14  8:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Graf, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On 09/13/2012 09:39 PM, Michal Hocko wrote:
> On Thu 13-09-12 10:18:32, Tejun Heo wrote:
>> Hello, Michal.
>>
>> On Thu, Sep 13, 2012 at 02:14:38PM +0200, Michal Hocko wrote:
>>> I would like to see use_hierarchy go away. The only concern I have is 
>>> to warn only if somebody is doing something wrong (aka flat
>>> hierarchies). Or better put it this way. Do not warn in cases which do
>>> not change if use_hierarchy is gone or default changes to 1.
>>> An example:
>>> root (use_hierarchy=0)
>>>  | \
>>>  |  A (use_hierarchy=0)
>>>  |
>>>  B (use_hierarachy=1)
>>>  |\
>>>  C D
>>>
>>> is a perfectly sane configuration and I do not see any reason to fill
>>> logs with some scary warnings when A is created. There will be no
>>> semantical change in this setup When use_hierchy is gone.
>>>
>>> So the only thing I am proposing here is to warn only if something
>>> should be fixed in the configuration in order to be prepared for fully
>>> hierarchical (and that is a second level of children from root with
>>> use_hierachy==0).
>>>
>>> Does it make more sense now?
>>
>> Ah, okay, so what you're saying is that we shouldn't warn if 0
>> .use_hierarchys don't make any behavior difference from when they're
>> all 1, right?  
> 
> Exactly. 1st level of children under the root is exactly this kind of
> setup.
> 
>> If so, I have no objection.  Will incorporate your updated version.
> 
> Thanks!
> 
I want oppose it as well, but I believe part of this exercise is to make
the need to have hierarchy widespread. Warning on the case
1st-level-only case helps with that, even if we make more noise than we
should.

The reason I supported Tejun's proposal originally, is that I think that
if we make the wrong amount of noise, being wrong by a surplus is better
than being wrong by a deficit, in this case.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                                     ` <5052E87A.1050405-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-09-14 19:15                                       ` Tejun Heo
       [not found]                                         ` <20120914191509.GN17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Tejun Heo @ 2012-09-14 19:15 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On Fri, Sep 14, 2012 at 12:19:06PM +0400, Glauber Costa wrote:
> I want oppose it as well, but I believe part of this exercise is to make
> the need to have hierarchy widespread. Warning on the case
> 1st-level-only case helps with that, even if we make more noise than we
> should.
> 
> The reason I supported Tejun's proposal originally, is that I think that
> if we make the wrong amount of noise, being wrong by a surplus is better
> than being wrong by a deficit, in this case.

I think both are valid points and don't think it makes a lot of
difference either way.  Michal being the maintainer of the code, I'm
taking his approach for this one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
       [not found]                                         ` <20120914191509.GN17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-17  9:11                                           ` Glauber Costa
  0 siblings, 0 replies; 28+ messages in thread
From: Glauber Costa @ 2012-09-17  9:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Neil Horman, Serge E. Hallyn,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Paul Mackerras,
	Aneesh Kumar K.V, Arnaldo Carvalho de Melo, Johannes Weiner,
	Thomas Graf, cgroups-u79uwXL29TY76Z2rM5mHXA, Paul Turner,
	Ingo Molnar, Vivek Goyal

On 09/14/2012 11:15 PM, Tejun Heo wrote:
> On Fri, Sep 14, 2012 at 12:19:06PM +0400, Glauber Costa wrote:
>> I want oppose it as well, but I believe part of this exercise is to make
>> the need to have hierarchy widespread. Warning on the case
>> 1st-level-only case helps with that, even if we make more noise than we
>> should.
>>
>> The reason I supported Tejun's proposal originally, is that I think that
>> if we make the wrong amount of noise, being wrong by a surplus is better
>> than being wrong by a deficit, in this case.
> 
> I think both are valid points and don't think it makes a lot of
> difference either way.  Michal being the maintainer of the code, I'm
> taking his approach for this one.
> 
> Thanks.
> 
That is fine.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-09-17  9:11 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20120910223125.GC7677@google.com>
     [not found] ` <20120910223125.GC7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-10 22:33   ` [PATCH REPOST RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them Tejun Heo
     [not found]     ` <20120910223355.GD7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-11 10:04       ` Michal Hocko
     [not found]         ` <20120911100433.GC8058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-11 17:07           ` Tejun Heo
     [not found]             ` <20120911170746.GL7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-12 15:47               ` Michal Hocko
     [not found]                 ` <20120912154745.GV21579-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-12 16:41                   ` Tejun Heo
2012-09-12  9:31           ` Glauber Costa
     [not found]             ` <5050568B.9090601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-12 15:49               ` Michal Hocko
     [not found]                 ` <20120912154907.GW21579-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-12 17:11                   ` Tejun Heo
     [not found]                     ` <20120912171120.GP7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-13 12:01                       ` Glauber Costa
     [not found]                         ` <5051CB24.4010801-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-13 17:21                           ` Tejun Heo
2012-09-13 12:14                       ` Michal Hocko
     [not found]                         ` <20120913121438.GC8055-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-13 17:18                           ` Tejun Heo
     [not found]                             ` <20120913171832.GY7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-13 17:39                               ` Michal Hocko
     [not found]                                 ` <20120913173958.GA21381-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-14  8:19                                   ` Glauber Costa
     [not found]                                     ` <5052E87A.1050405-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-14 19:15                                       ` Tejun Heo
     [not found]                                         ` <20120914191509.GN17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-17  9:11                                           ` Glauber Costa
2012-09-11 12:38       ` Li Zefan
     [not found]         ` <504F30DB.60808-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2012-09-11 17:08           ` Tejun Heo
     [not found]             ` <20120911170837.GM7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-11 17:43               ` Tejun Heo
     [not found]                 ` <20120911174319.GO7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-12  9:37                   ` Glauber Costa
     [not found]                     ` <505057D8.4010908-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-12 16:34                       ` Tejun Heo
     [not found]                         ` <20120912163433.GL7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-13  6:48                           ` Li Zefan
2012-09-12  9:34           ` Glauber Costa
2012-09-11 18:23       ` [PATCH UPDATED " Tejun Heo
     [not found]         ` <20120911182356.GR7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-11 20:50           ` Aristeu Rozanski
     [not found]             ` <20120911205027.GA25837-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-11 20:51               ` Tejun Heo
2012-09-13 12:16       ` [PATCH REPOST " Daniel P. Berrange
     [not found]         ` <20120913121629.GL7767-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-13 17:52           ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).