cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive
@ 2016-02-24 22:02 Tejun Heo
  2016-02-24 22:02 ` [PATCH 01/16] cgroup: separate out interface file creation from css creation Tejun Heo
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

Hello,

Changes from v1[1] are

* Patches 0012-0014 and 0016 added.  They further refactor the
  operations and convert existing css management paths to use the new
  modular operations.

* 0005 updated to yield the correct result on v1 hierarchies too.

* 0015 updated to reflect earlier changes.

Currently, subsystem enabling and disabling in the default hierarchy
are implemented as a long chain of interdependent operations in
cgroup_subtree_control_write().  The function calculates what need to
be done to its children and excute the necessary operations.  The
function unfortunately ends up being a rather opaque blob of
operations which is difficult to wrap one's head around or reuse.

This patchset restructures control mask update so that it's composed
of discrete idempotent and recursive operations and convert existing
css managment paths to use them which makes them simpler, more robust
and capable of recursive operations.

This patchset includes the following 12 patches.

 0001-cgroup-separate-out-interface-file-creation-from-css.patch
 0002-cgroup-explicitly-track-whether-a-cgroup_subsys_stat.patch
 0003-cgroup-reorder-operations-in-cgroup_mkdir.patch
 0004-cgroup-factor-out-cgroup_create-out-of-cgroup_mkdir.patch
 0005-cgroup-introduce-cgroup_control-and-cgroup_ss_mask.patch
 0006-cgroup-factor-out-cgroup_drain_offline-from-cgroup_s.patch
 0007-cgroup-factor-out-cgroup_apply_control_disable-from-.patch
 0008-cgroup-factor-out-cgroup_apply_control_enable-from-c.patch
 0009-cgroup-make-cgroup_drain_offline-and-cgroup_apply_co.patch
 0010-cgroup-introduce-cgroup_-save-propagate-restore-_con.patch
 0011-cgroup-factor-out-cgroup_-apply-finalize-_control-fr.patch
 0012-cgroup-combine-cgroup_mutex-locking-and-offline-css-.patch
 0013-cgroup-use-cgroup_apply_enable_control-in-cgroup-cre.patch
 0014-cgroup-reimplement-rebind_subsystems-using-cgroup_ap.patch
 0015-cgroup-make-cgroup_calc_subtree_ss_mask-take-this_ss.patch
 0016-cgroup-allocate-2x-cgrp_cset_links-when-setting-up-a.patch

The patchset is available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-recursive-control

diffstat follows.  Thanks.

 include/linux/cgroup-defs.h |    3 
 kernel/cgroup.c             |  760 +++++++++++++++++++++++++-------------------
 2 files changed, 446 insertions(+), 317 deletions(-)

--
tejun

[1] http://lkml.kernel.org/g/1456199146-14765-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/16] cgroup: separate out interface file creation from css creation
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
@ 2016-02-24 22:02 ` Tejun Heo
  2016-02-24 22:02 ` [PATCH 02/16] cgroup: explicitly track whether a cgroup_subsys_state is visible to userland Tejun Heo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

Currently, interface files are created when a css is created depending
on whether @visible is set.  This patch separates out the two into
separate steps to help code refactoring and eventually allow cgroups
which aren't visible through cgroup fs.

Move css_populate_dir() out of create_css() and drop @visible.  While
at it, rename the function to css_create() for consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 72 ++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 29 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ac5451e..3784f5a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -222,8 +222,8 @@ static struct cftype cgroup_legacy_base_files[];
 static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
 static void css_task_iter_advance(struct css_task_iter *it);
 static int cgroup_destroy_locked(struct cgroup *cgrp);
-static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
-		      bool visible);
+static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
+					      struct cgroup_subsys *ss);
 static void css_release(struct percpu_ref *ref);
 static void kill_css(struct cgroup_subsys_state *css);
 static int cgroup_addrm_files(struct cgroup_subsys_state *css,
@@ -3081,14 +3081,26 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	 */
 	do_each_subsys_mask(ss, ssid, enable) {
 		cgroup_for_each_live_child(child, cgrp) {
-			if (css_enable & (1 << ssid))
-				ret = create_css(child, ss,
-					cgrp->subtree_control & (1 << ssid));
-			else
+			if (css_enable & (1 << ssid)) {
+				struct cgroup_subsys_state *css;
+
+				css = css_create(child, ss);
+				if (IS_ERR(css)) {
+					ret = PTR_ERR(css);
+					goto err_undo_css;
+				}
+
+				if (cgrp->subtree_control & (1 << ssid)) {
+					ret = css_populate_dir(css, NULL);
+					if (ret)
+						goto err_undo_css;
+				}
+			} else {
 				ret = css_populate_dir(cgroup_css(child, ss),
 						       NULL);
-			if (ret)
-				goto err_undo_css;
+				if (ret)
+					goto err_undo_css;
+			}
 		}
 	} while_each_subsys_mask();
 
@@ -4716,7 +4728,9 @@ static void css_release_work_fn(struct work_struct *work)
 		 * Those are supported by RCU protecting clearing of
 		 * cgrp->kn->priv backpointer.
 		 */
-		RCU_INIT_POINTER(*(void __rcu __force **)&cgrp->kn->priv, NULL);
+		if (cgrp->kn)
+			RCU_INIT_POINTER(*(void __rcu __force **)&cgrp->kn->priv,
+					 NULL);
 	}
 
 	mutex_unlock(&cgroup_mutex);
@@ -4797,17 +4811,16 @@ static void offline_css(struct cgroup_subsys_state *css)
 }
 
 /**
- * create_css - create a cgroup_subsys_state
+ * css_create - create a cgroup_subsys_state
  * @cgrp: the cgroup new css will be associated with
  * @ss: the subsys of new css
- * @visible: whether to create control knobs for the new css or not
  *
  * Create a new css associated with @cgrp - @ss pair.  On success, the new
- * css is online and installed in @cgrp with all interface files created if
- * @visible.  Returns 0 on success, -errno on failure.
+ * css is online and installed in @cgrp.  This function doesn't create the
+ * interface files.  Returns 0 on success, -errno on failure.
  */
-static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
-		      bool visible)
+static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
+					      struct cgroup_subsys *ss)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	struct cgroup_subsys_state *parent_css = cgroup_css(parent, ss);
@@ -4818,7 +4831,7 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
 
 	css = ss->css_alloc(parent_css);
 	if (IS_ERR(css))
-		return PTR_ERR(css);
+		return css;
 
 	init_and_link_css(css, ss, cgrp);
 
@@ -4831,12 +4844,6 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
 		goto err_free_percpu_ref;
 	css->id = err;
 
-	if (visible) {
-		err = css_populate_dir(css, NULL);
-		if (err)
-			goto err_free_id;
-	}
-
 	/* @css is ready to be brought online now, make it visible */
 	list_add_tail_rcu(&css->sibling, &parent_css->children);
 	cgroup_idr_replace(&ss->css_idr, css, css->id);
@@ -4854,18 +4861,16 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
 		ss->warned_broken_hierarchy = true;
 	}
 
-	return 0;
+	return css;
 
 err_list_del:
 	list_del_rcu(&css->sibling);
-	css_clear_dir(css, NULL);
-err_free_id:
 	cgroup_idr_remove(&ss->css_idr, css->id);
 err_free_percpu_ref:
 	percpu_ref_exit(&css->refcnt);
 err_free_css:
 	call_rcu(&css->rcu_head, css_free_rcu_fn);
-	return err;
+	return ERR_PTR(err);
 }
 
 static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
@@ -4962,10 +4967,19 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 
 	/* let's create and online css's */
 	do_each_subsys_mask(ss, ssid, parent->subtree_ss_mask) {
-		ret = create_css(cgrp, ss,
-				 parent->subtree_control & (1 << ssid));
-		if (ret)
+		struct cgroup_subsys_state *css;
+
+		css = css_create(cgrp, ss);
+		if (IS_ERR(css)) {
+			ret = PTR_ERR(css);
 			goto out_destroy;
+		}
+
+		if (parent->subtree_control & (1 << ssid)) {
+			ret = css_populate_dir(css, NULL);
+			if (ret)
+				goto out_destroy;
+		}
 	} while_each_subsys_mask();
 
 	/*
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/16] cgroup: explicitly track whether a cgroup_subsys_state is visible to userland
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
  2016-02-24 22:02 ` [PATCH 01/16] cgroup: separate out interface file creation from css creation Tejun Heo
@ 2016-02-24 22:02 ` Tejun Heo
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

Currently, whether a css (cgroup_subsys_state) has its interface files
created is not tracked and assumed to change together with the owning
cgroup's lifecycle.  cgroup directory and interface creation is being
separated out from internal object creation to help refactoring and
eventually allow cgroups which are not visible through cgroupfs.

This patch adds CSS_VISIBLE to track whether a css has its interface
files created and perform management operations only when necessary
which helps decoupling interface file handling from internal object
lifecycle.  After this patch, all css interface file management
functions can be called regardless of the current state and will
achieve the expected result.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup-defs.h |  1 +
 kernel/cgroup.c             | 13 ++++++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 8fc3f04..7593c1a 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -45,6 +45,7 @@ enum {
 	CSS_NO_REF	= (1 << 0), /* no reference counting for this css */
 	CSS_ONLINE	= (1 << 1), /* between ->css_online() and ->css_offline() */
 	CSS_RELEASED	= (1 << 2), /* refcnt reached zero, released */
+	CSS_VISIBLE	= (1 << 3), /* css is visible to userland */
 };
 
 /* bits in struct cgroup flags field */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3784f5a..c5fa761 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1420,6 +1420,11 @@ static void css_clear_dir(struct cgroup_subsys_state *css,
 	struct cgroup *cgrp = cgrp_override ?: css->cgroup;
 	struct cftype *cfts;
 
+	if (!(css->flags & CSS_VISIBLE))
+		return;
+
+	css->flags &= ~CSS_VISIBLE;
+
 	list_for_each_entry(cfts, &css->ss->cfts, node)
 		cgroup_addrm_files(css, cgrp, cfts, false);
 }
@@ -1438,6 +1443,9 @@ static int css_populate_dir(struct cgroup_subsys_state *css,
 	struct cftype *cfts, *failed_cfts;
 	int ret;
 
+	if (css->flags & CSS_VISIBLE)
+		return 0;
+
 	if (!css->ss) {
 		if (cgroup_on_dfl(cgrp))
 			cfts = cgroup_dfl_base_files;
@@ -1454,6 +1462,9 @@ static int css_populate_dir(struct cgroup_subsys_state *css,
 			goto err;
 		}
 	}
+
+	css->flags |= CSS_VISIBLE;
+
 	return 0;
 err:
 	list_for_each_entry(cfts, &css->ss->cfts, node) {
@@ -3402,7 +3413,7 @@ static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
 	css_for_each_descendant_pre(css, cgroup_css(root, ss)) {
 		struct cgroup *cgrp = css->cgroup;
 
-		if (cgroup_is_dead(cgrp))
+		if (!(css->flags & CSS_VISIBLE))
 			continue;
 
 		ret = cgroup_addrm_files(css, cgrp, cfts, is_add);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/16] cgroup: reorder operations in cgroup_mkdir()
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2016-02-24 22:02   ` Tejun Heo
  2016-02-24 22:02   ` [PATCH 07/16] cgroup: factor out cgroup_apply_control_disable() from cgroup_subtree_control_write() Tejun Heo
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

Currently, operations to initialize internal objects and create
interface directory and files are intermixed in cgroup_mkdir().  We're
in the process of refactoring cgroup and css management paths to
separate them out to eventually allow cgroups which aren't visible
through cgroup fs.

This patch reorders operations inside cgroup_mkdir() so that interface
directory and file handling comes after internal object
initialization.  This will enable further refactoring.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 61 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 30 insertions(+), 31 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c5fa761..204f78a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4941,20 +4941,6 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &parent->flags))
 		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags);
 
-	/* create the directory */
-	kn = kernfs_create_dir(parent->kn, name, mode, cgrp);
-	if (IS_ERR(kn)) {
-		ret = PTR_ERR(kn);
-		goto out_free_id;
-	}
-	cgrp->kn = kn;
-
-	/*
-	 * This extra ref will be put in cgroup_free_fn() and guarantees
-	 * that @cgrp->kn is always accessible.
-	 */
-	kernfs_get(kn);
-
 	cgrp->self.serial_nr = css_serial_nr_next++;
 
 	/* allocation complete, commit to creation */
@@ -4968,15 +4954,7 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	 */
 	cgroup_idr_replace(&root->cgroup_idr, cgrp, cgrp->id);
 
-	ret = cgroup_kn_set_ugid(kn);
-	if (ret)
-		goto out_destroy;
-
-	ret = css_populate_dir(&cgrp->self, NULL);
-	if (ret)
-		goto out_destroy;
-
-	/* let's create and online css's */
+	/* create the csses */
 	do_each_subsys_mask(ss, ssid, parent->subtree_ss_mask) {
 		struct cgroup_subsys_state *css;
 
@@ -4985,12 +4963,6 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 			ret = PTR_ERR(css);
 			goto out_destroy;
 		}
-
-		if (parent->subtree_control & (1 << ssid)) {
-			ret = css_populate_dir(css, NULL);
-			if (ret)
-				goto out_destroy;
-		}
 	} while_each_subsys_mask();
 
 	/*
@@ -5002,13 +4974,40 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 		cgroup_refresh_subtree_ss_mask(cgrp);
 	}
 
+	/* create the directory */
+	kn = kernfs_create_dir(parent->kn, name, mode, cgrp);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto out_destroy;
+	}
+	cgrp->kn = kn;
+
+	/*
+	 * This extra ref will be put in cgroup_free_fn() and guarantees
+	 * that @cgrp->kn is always accessible.
+	 */
+	kernfs_get(kn);
+
+	ret = cgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_destroy;
+
+	ret = css_populate_dir(&cgrp->self, NULL);
+	if (ret)
+		goto out_destroy;
+
+	do_each_subsys_mask(ss, ssid, parent->subtree_control) {
+		ret = css_populate_dir(cgroup_css(cgrp, ss), NULL);
+		if (ret)
+			goto out_destroy;
+	} while_each_subsys_mask();
+
+	/* let's create and online css's */
 	kernfs_activate(kn);
 
 	ret = 0;
 	goto out_unlock;
 
-out_free_id:
-	cgroup_idr_remove(&root->cgroup_idr, cgrp->id);
 out_cancel_ref:
 	percpu_ref_exit(&cgrp->self.refcnt);
 out_free_cgrp:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/16] cgroup: factor out cgroup_create() out of cgroup_mkdir()
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
                   ` (2 preceding siblings ...)
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2016-02-24 22:02 ` Tejun Heo
  2016-02-24 22:02 ` [PATCH 05/16] cgroup: introduce cgroup_control() and cgroup_ss_mask() Tejun Heo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

We're in the process of refactoring cgroup and css management paths to
separate them out to eventually allow cgroups which aren't visible
through cgroup fs.  This patch factors out cgroup_create() out of
cgroup_mkdir().  cgroup_create() contains all internal object creation
and initialization.  cgroup_mkdir() uses cgroup_create() to create the
internal cgroup and adds interface directory and file creation.

This patch doesn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 72 ++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 29 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 204f78a..070c078 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4884,33 +4884,19 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
 	return ERR_PTR(err);
 }
 
-static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
-			umode_t mode)
+static struct cgroup *cgroup_create(struct cgroup *parent)
 {
-	struct cgroup *parent, *cgrp, *tcgrp;
-	struct cgroup_root *root;
+	struct cgroup_root *root = parent->root;
 	struct cgroup_subsys *ss;
-	struct kernfs_node *kn;
-	int level, ssid, ret;
-
-	/* Do not accept '\n' to prevent making /proc/<pid>/cgroup unparsable.
-	 */
-	if (strchr(name, '\n'))
-		return -EINVAL;
-
-	parent = cgroup_kn_lock_live(parent_kn);
-	if (!parent)
-		return -ENODEV;
-	root = parent->root;
-	level = parent->level + 1;
+	struct cgroup *cgrp, *tcgrp;
+	int level = parent->level + 1;
+	int ssid, ret;
 
 	/* allocate the cgroup and its ID, 0 is reserved for the root */
 	cgrp = kzalloc(sizeof(*cgrp) +
 		       sizeof(cgrp->ancestor_ids[0]) * (level + 1), GFP_KERNEL);
-	if (!cgrp) {
-		ret = -ENOMEM;
-		goto out_unlock;
-	}
+	if (!cgrp)
+		return ERR_PTR(-ENOMEM);
 
 	ret = percpu_ref_init(&cgrp->self.refcnt, css_release, 0, GFP_KERNEL);
 	if (ret)
@@ -4974,6 +4960,40 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 		cgroup_refresh_subtree_ss_mask(cgrp);
 	}
 
+	return cgrp;
+
+out_cancel_ref:
+	percpu_ref_exit(&cgrp->self.refcnt);
+out_free_cgrp:
+	kfree(cgrp);
+	return ERR_PTR(ret);
+out_destroy:
+	cgroup_destroy_locked(cgrp);
+	return ERR_PTR(ret);
+}
+
+static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
+			umode_t mode)
+{
+	struct cgroup *parent, *cgrp;
+	struct cgroup_subsys *ss;
+	struct kernfs_node *kn;
+	int ssid, ret;
+
+	/* do not accept '\n' to prevent making /proc/<pid>/cgroup unparsable */
+	if (strchr(name, '\n'))
+		return -EINVAL;
+
+	parent = cgroup_kn_lock_live(parent_kn);
+	if (!parent)
+		return -ENODEV;
+
+	cgrp = cgroup_create(parent);
+	if (IS_ERR(cgrp)) {
+		ret = PTR_ERR(cgrp);
+		goto out_unlock;
+	}
+
 	/* create the directory */
 	kn = kernfs_create_dir(parent->kn, name, mode, cgrp);
 	if (IS_ERR(kn)) {
@@ -5008,17 +5028,11 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	ret = 0;
 	goto out_unlock;
 
-out_cancel_ref:
-	percpu_ref_exit(&cgrp->self.refcnt);
-out_free_cgrp:
-	kfree(cgrp);
+out_destroy:
+	cgroup_destroy_locked(cgrp);
 out_unlock:
 	cgroup_kn_unlock(parent_kn);
 	return ret;
-
-out_destroy:
-	cgroup_destroy_locked(cgrp);
-	goto out_unlock;
 }
 
 /*
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/16] cgroup: introduce cgroup_control() and cgroup_ss_mask()
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
                   ` (3 preceding siblings ...)
  2016-02-24 22:02 ` [PATCH 04/16] cgroup: factor out cgroup_create() out of cgroup_mkdir() Tejun Heo
@ 2016-02-24 22:02 ` Tejun Heo
  2016-02-24 22:02 ` [PATCH 06/16] cgroup: factor out cgroup_drain_offline() from cgroup_subtree_control_write() Tejun Heo
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

When a controller is enabled and visible on a non-root cgroup is
determined by subtree_control and subtree_ss_mask of the parent
cgroup.  For a root cgroup, by the type of the hierarchy and which
controllers are attached to it.  Deciding the above on each usage is
fragile and unnecessarily complicates the users.

This patch introduces cgroup_control() and cgroup_ss_mask() which
calculate and return the [visibly] enabled subsyste mask for the
specified cgroup and conver the existing usages.

* cgroup_e_css() is restructured for simplicity.

* cgroup_calc_subtree_ss_mask() and cgroup_subtree_control_write() no
  longer need to distinguish root and non-root cases.

* With cgroup_control(), cgroup_controllers_show() can now handle both
  root and non-root cases.  cgroup_root_controllers_show() is removed.

v2: cgroup_control() updated to yield the correct result on v1
    hierarchies too.  cgroup_subtree_control_write() converted.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 72 +++++++++++++++++++++++++++++----------------------------
 1 file changed, 37 insertions(+), 35 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 070c078..ada8452 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -346,6 +346,32 @@ static struct cgroup *cgroup_parent(struct cgroup *cgrp)
 	return NULL;
 }
 
+/* subsystems visibly enabled on a cgroup */
+static u16 cgroup_control(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+	u16 root_ss_mask = cgrp->root->subsys_mask;
+
+	if (parent)
+		return parent->subtree_control;
+
+	if (cgroup_on_dfl(cgrp))
+		root_ss_mask &= ~cgrp_dfl_inhibit_ss_mask;
+
+	return root_ss_mask;
+}
+
+/* subsystems enabled on a cgroup */
+static u16 cgroup_ss_mask(struct cgroup *cgrp)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	if (parent)
+		return parent->subtree_ss_mask;
+
+	return cgrp->root->subsys_mask;
+}
+
 /**
  * cgroup_css - obtain a cgroup's css for the specified subsystem
  * @cgrp: the cgroup of interest
@@ -385,16 +411,15 @@ static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
 	if (!ss)
 		return &cgrp->self;
 
-	if (!(cgrp->root->subsys_mask & (1 << ss->id)))
-		return NULL;
-
 	/*
 	 * This function is used while updating css associations and thus
-	 * can't test the csses directly.  Use ->subtree_ss_mask.
+	 * can't test the csses directly.  Test ss_mask.
 	 */
-	while (cgroup_parent(cgrp) &&
-	       !(cgroup_parent(cgrp)->subtree_ss_mask & (1 << ss->id)))
+	while (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) {
 		cgrp = cgroup_parent(cgrp);
+		if (!cgrp)
+			return NULL;
+	}
 
 	return cgroup_css(cgrp, ss);
 }
@@ -1275,7 +1300,6 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
  */
 static u16 cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, u16 subtree_control)
 {
-	struct cgroup *parent = cgroup_parent(cgrp);
 	u16 cur_ss_mask = subtree_control;
 	struct cgroup_subsys *ss;
 	int ssid;
@@ -1297,10 +1321,7 @@ static u16 cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, u16 subtree_control)
 		 * happen only if some depended-upon subsystems were bound
 		 * to non-default hierarchies.
 		 */
-		if (parent)
-			new_ss_mask &= parent->subtree_ss_mask;
-		else
-			new_ss_mask &= cgrp->root->subsys_mask;
+		new_ss_mask &= cgroup_ss_mask(cgrp);
 
 		if (new_ss_mask == cur_ss_mask)
 			break;
@@ -2863,22 +2884,12 @@ static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
 		seq_putc(seq, '\n');
 }
 
-/* show controllers which are currently attached to the default hierarchy */
-static int cgroup_root_controllers_show(struct seq_file *seq, void *v)
-{
-	struct cgroup *cgrp = seq_css(seq)->cgroup;
-
-	cgroup_print_ss_mask(seq, cgrp->root->subsys_mask &
-			     ~cgrp_dfl_inhibit_ss_mask);
-	return 0;
-}
-
 /* show controllers which are enabled from the parent */
 static int cgroup_controllers_show(struct seq_file *seq, void *v)
 {
 	struct cgroup *cgrp = seq_css(seq)->cgroup;
 
-	cgroup_print_ss_mask(seq, cgroup_parent(cgrp)->subtree_control);
+	cgroup_print_ss_mask(seq, cgroup_control(cgrp));
 	return 0;
 }
 
@@ -3004,10 +3015,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 				continue;
 			}
 
-			/* unavailable or not enabled on the parent? */
-			if (!(cgrp_dfl_root.subsys_mask & (1 << ssid)) ||
-			    (cgroup_parent(cgrp) &&
-			     !(cgroup_parent(cgrp)->subtree_control & (1 << ssid)))) {
+			if (!(cgroup_control(cgrp) & (1 << ssid))) {
 				ret = -ENOENT;
 				goto out_unlock;
 			}
@@ -4565,12 +4573,6 @@ static struct cftype cgroup_dfl_base_files[] = {
 	},
 	{
 		.name = "cgroup.controllers",
-		.flags = CFTYPE_ONLY_ON_ROOT,
-		.seq_show = cgroup_root_controllers_show,
-	},
-	{
-		.name = "cgroup.controllers",
-		.flags = CFTYPE_NOT_ON_ROOT,
 		.seq_show = cgroup_controllers_show,
 	},
 	{
@@ -4941,7 +4943,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	cgroup_idr_replace(&root->cgroup_idr, cgrp, cgrp->id);
 
 	/* create the csses */
-	do_each_subsys_mask(ss, ssid, parent->subtree_ss_mask) {
+	do_each_subsys_mask(ss, ssid, cgroup_ss_mask(cgrp)) {
 		struct cgroup_subsys_state *css;
 
 		css = css_create(cgrp, ss);
@@ -4956,7 +4958,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	 * subtree_control from the parent.  Each is configured manually.
 	 */
 	if (!cgroup_on_dfl(cgrp)) {
-		cgrp->subtree_control = parent->subtree_control;
+		cgrp->subtree_control = cgroup_control(cgrp);
 		cgroup_refresh_subtree_ss_mask(cgrp);
 	}
 
@@ -5016,7 +5018,7 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	if (ret)
 		goto out_destroy;
 
-	do_each_subsys_mask(ss, ssid, parent->subtree_control) {
+	do_each_subsys_mask(ss, ssid, cgroup_control(cgrp)) {
 		ret = css_populate_dir(cgroup_css(cgrp, ss), NULL);
 		if (ret)
 			goto out_destroy;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/16] cgroup: factor out cgroup_drain_offline() from cgroup_subtree_control_write()
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
                   ` (4 preceding siblings ...)
  2016-02-24 22:02 ` [PATCH 05/16] cgroup: introduce cgroup_control() and cgroup_ss_mask() Tejun Heo
@ 2016-02-24 22:02 ` Tejun Heo
  2016-02-24 22:02 ` [PATCH 09/16] cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive Tejun Heo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

Factor out async css offline draining into cgroup_drain_offline().

* Nest subsystem walk inside child walk.  The child walk will later be
  converted to subtree walk which is a bit more expensive.

* Relocate the draining above subsystem mask preparation, which
  doesn't create any behavior differences but helps further
  refactoring.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 77 ++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 52 insertions(+), 25 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ada8452..7f1c636 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2964,6 +2964,53 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 	return ret;
 }
 
+/**
+ * cgroup_drain_offline - wait for previously offlined csses to go away
+ * @cgrp: parent of the target cgroups
+ *
+ * Because css offlining is asynchronous, userland may try to re-enable a
+ * controller while the previous css is still around.  This function drains
+ * the previous css instances of @cgrp's children.
+ *
+ * Must be called with cgroup_mutex held.  Returns %false if there were no
+ * dying css instances.  Returns %true if there were one or more and this
+ * function waited.  On %true return, cgroup_mutex has been dropped and
+ * re-acquired inbetween which anything could have happened.  The caller
+ * typically would have to start over.
+ */
+static bool cgroup_drain_offline(struct cgroup *cgrp)
+{
+	struct cgroup *dsct;
+	struct cgroup_subsys *ss;
+	int ssid;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	cgroup_for_each_live_child(dsct, cgrp) {
+		for_each_subsys(ss, ssid) {
+			struct cgroup_subsys_state *css = cgroup_css(dsct, ss);
+			DEFINE_WAIT(wait);
+
+			if (!css)
+				continue;
+
+			cgroup_get(dsct);
+			prepare_to_wait(&dsct->offline_waitq, &wait,
+					TASK_UNINTERRUPTIBLE);
+
+			mutex_unlock(&cgroup_mutex);
+			schedule();
+			finish_wait(&dsct->offline_waitq, &wait);
+			mutex_lock(&cgroup_mutex);
+
+			cgroup_put(dsct);
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /* change the enabled child controllers for a cgroup in the default hierarchy */
 static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 					    char *buf, size_t nbytes,
@@ -3049,6 +3096,11 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 		goto out_unlock;
 	}
 
+	if (cgroup_drain_offline(cgrp)) {
+		cgroup_kn_unlock(of->kn);
+		return restart_syscall();
+	}
+
 	/*
 	 * Update subsys masks and calculate what needs to be done.  More
 	 * subsystems than specified may need to be enabled or disabled
@@ -3064,31 +3116,6 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	enable |= css_enable;
 	disable |= css_disable;
 
-	/*
-	 * Because css offlining is asynchronous, userland might try to
-	 * re-enable the same controller while the previous instance is
-	 * still around.  In such cases, wait till it's gone using
-	 * offline_waitq.
-	 */
-	do_each_subsys_mask(ss, ssid, css_enable) {
-		cgroup_for_each_live_child(child, cgrp) {
-			DEFINE_WAIT(wait);
-
-			if (!cgroup_css(child, ss))
-				continue;
-
-			cgroup_get(child);
-			prepare_to_wait(&child->offline_waitq, &wait,
-					TASK_UNINTERRUPTIBLE);
-			cgroup_kn_unlock(of->kn);
-			schedule();
-			finish_wait(&child->offline_waitq, &wait);
-			cgroup_put(child);
-
-			return restart_syscall();
-		}
-	} while_each_subsys_mask();
-
 	cgrp->subtree_control = new_sc;
 	cgrp->subtree_ss_mask = new_ss;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/16] cgroup: factor out cgroup_apply_control_disable() from cgroup_subtree_control_write()
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2016-02-24 22:02   ` [PATCH 03/16] cgroup: reorder operations in cgroup_mkdir() Tejun Heo
@ 2016-02-24 22:02   ` Tejun Heo
  2016-02-24 22:02   ` [PATCH 08/16] cgroup: factor out cgroup_apply_control_enable() " Tejun Heo
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

Factor out css disabling and hiding into cgroup_apply_control_disable().

* Nest subsystem walk inside child walk.  The child walk will later be
  converted to subtree walk which is a bit more expensive.

* Instead of operating on the differential masks @css_enable and
  @css_disable, simply disable or hide csses according to the current
  cgroup_control() and cgroup_ss_mask().  This leads to the same
  result and is simpler and more robust.

* This allows error handling path to share the same code.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 74 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7f1c636..4b8a6eb 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3011,6 +3011,43 @@ static bool cgroup_drain_offline(struct cgroup *cgrp)
 	return false;
 }
 
+/**
+ * cgroup_apply_control_disable - kill or hide csses according to control
+ * @cgrp: parent of the target cgroups
+ *
+ * Walk @cgrp's children and kill and hide csses so that they match
+ * cgroup_ss_mask() and cgroup_visible_mask().
+ *
+ * A css is hidden when the userland requests it to be disabled while other
+ * subsystems are still depending on it.  The css must not actively control
+ * resources and be in the vanilla state if it's made visible again later.
+ * Controllers which may be depended upon should provide ->css_reset() for
+ * this purpose.
+ */
+static void cgroup_apply_control_disable(struct cgroup *cgrp)
+{
+	struct cgroup *dsct;
+	struct cgroup_subsys *ss;
+	int ssid;
+
+	cgroup_for_each_live_child(dsct, cgrp) {
+		for_each_subsys(ss, ssid) {
+			struct cgroup_subsys_state *css = cgroup_css(dsct, ss);
+
+			if (!css)
+				continue;
+
+			if (!(cgroup_ss_mask(dsct) & (1 << ss->id))) {
+				kill_css(css);
+			} else if (!(cgroup_control(dsct) & (1 << ss->id))) {
+				css_clear_dir(css, NULL);
+				if (ss->css_reset)
+					ss->css_reset(css);
+			}
+		}
+	}
+}
+
 /* change the enabled child controllers for a cgroup in the default hierarchy */
 static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 					    char *buf, size_t nbytes,
@@ -3159,27 +3196,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	if (ret)
 		goto err_undo_css;
 
-	/*
-	 * All tasks are migrated out of disabled csses.  Kill or hide
-	 * them.  A css is hidden when the userland requests it to be
-	 * disabled while other subsystems are still depending on it.  The
-	 * css must not actively control resources and be in the vanilla
-	 * state if it's made visible again later.  Controllers which may
-	 * be depended upon should provide ->css_reset() for this purpose.
-	 */
-	do_each_subsys_mask(ss, ssid, disable) {
-		cgroup_for_each_live_child(child, cgrp) {
-			struct cgroup_subsys_state *css = cgroup_css(child, ss);
-
-			if (css_disable & (1 << ssid)) {
-				kill_css(css);
-			} else {
-				css_clear_dir(css, NULL);
-				if (ss->css_reset)
-					ss->css_reset(css);
-			}
-		}
-	} while_each_subsys_mask();
+	/* all tasks are migrated out of disabled csses, commit disable */
+	cgroup_apply_control_disable(cgrp);
 
 	kernfs_activate(cgrp->kn);
 	ret = 0;
@@ -3188,22 +3206,12 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 
 err_undo_css:
+	/* restore masks and shoot down new csses */
 	cgrp->subtree_control = old_sc;
 	cgrp->subtree_ss_mask = old_ss;
 
-	do_each_subsys_mask(ss, ssid, enable) {
-		cgroup_for_each_live_child(child, cgrp) {
-			struct cgroup_subsys_state *css = cgroup_css(child, ss);
-
-			if (!css)
-				continue;
+	cgroup_apply_control_disable(cgrp);
 
-			if (css_enable & (1 << ssid))
-				kill_css(css);
-			else
-				css_clear_dir(css, NULL);
-		}
-	} while_each_subsys_mask();
 	goto out_unlock;
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/16] cgroup: factor out cgroup_apply_control_enable() from cgroup_subtree_control_write()
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2016-02-24 22:02   ` [PATCH 03/16] cgroup: reorder operations in cgroup_mkdir() Tejun Heo
  2016-02-24 22:02   ` [PATCH 07/16] cgroup: factor out cgroup_apply_control_disable() from cgroup_subtree_control_write() Tejun Heo
@ 2016-02-24 22:02   ` Tejun Heo
  2016-02-24 22:02   ` [PATCH 12/16] cgroup: combine cgroup_mutex locking and offline css draining Tejun Heo
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

Factor out css enabling and showing into cgroup_apply_control_enable().

* Nest subsystem walk inside child walk.  The child walk will later be
  converted to subtree walk which is a bit more expensive.

* Instead of operating on the differential masks @css_enable, simply
  enable or show csses according to the current cgroup_control() and
  cgroup_ss_mask().  This leads to the same result and is simpler and
  more robust.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 77 +++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 30 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4b8a6eb..bcf0bad 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3012,6 +3012,49 @@ static bool cgroup_drain_offline(struct cgroup *cgrp)
 }
 
 /**
+ * cgroup_apply_control_enable - enable or show csses according to control
+ * @cgrp: parent of the target cgroups
+ *
+ * Walk @cgrp's children and create new csses or make the existing ones
+ * visible.  A css is created invisible if it's being implicitly enabled
+ * through dependency.  An invisible css is made visible when the userland
+ * explicitly enables it.
+ *
+ * Returns 0 on success, -errno on failure.  On failure, csses which have
+ * been processed already aren't cleaned up.  The caller is responsible for
+ * cleaning up with cgroup_apply_control_disble().
+ */
+static int cgroup_apply_control_enable(struct cgroup *cgrp)
+{
+	struct cgroup *dsct;
+	struct cgroup_subsys *ss;
+	int ssid, ret;
+
+	cgroup_for_each_live_child(dsct, cgrp) {
+		for_each_subsys(ss, ssid) {
+			struct cgroup_subsys_state *css = cgroup_css(dsct, ss);
+
+			if (!(cgroup_ss_mask(dsct) & (1 << ss->id)))
+				continue;
+
+			if (!css) {
+				css = css_create(dsct, ss);
+				if (IS_ERR(css))
+					return PTR_ERR(css);
+			}
+
+			if (cgroup_control(dsct) & (1 << ss->id)) {
+				ret = css_populate_dir(css, NULL);
+				if (ret)
+					return ret;
+			}
+		}
+	}
+
+	return 0;
+}
+
+/**
  * cgroup_apply_control_disable - kill or hide csses according to control
  * @cgrp: parent of the target cgroups
  *
@@ -3156,36 +3199,10 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	cgrp->subtree_control = new_sc;
 	cgrp->subtree_ss_mask = new_ss;
 
-	/*
-	 * Create new csses or make the existing ones visible.  A css is
-	 * created invisible if it's being implicitly enabled through
-	 * dependency.  An invisible css is made visible when the userland
-	 * explicitly enables it.
-	 */
-	do_each_subsys_mask(ss, ssid, enable) {
-		cgroup_for_each_live_child(child, cgrp) {
-			if (css_enable & (1 << ssid)) {
-				struct cgroup_subsys_state *css;
-
-				css = css_create(child, ss);
-				if (IS_ERR(css)) {
-					ret = PTR_ERR(css);
-					goto err_undo_css;
-				}
-
-				if (cgrp->subtree_control & (1 << ssid)) {
-					ret = css_populate_dir(css, NULL);
-					if (ret)
-						goto err_undo_css;
-				}
-			} else {
-				ret = css_populate_dir(cgroup_css(child, ss),
-						       NULL);
-				if (ret)
-					goto err_undo_css;
-			}
-		}
-	} while_each_subsys_mask();
+	/* prepare csses */
+	ret = cgroup_apply_control_enable(cgrp);
+	if (ret)
+		goto err_undo_css;
 
 	/*
 	 * At this point, cgroup_e_css() results reflect the new csses
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/16] cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
                   ` (5 preceding siblings ...)
  2016-02-24 22:02 ` [PATCH 06/16] cgroup: factor out cgroup_drain_offline() from cgroup_subtree_control_write() Tejun Heo
@ 2016-02-24 22:02 ` Tejun Heo
  2016-02-24 22:02 ` [PATCH 10/16] cgroup: introduce cgroup_{save|propagate|restore}_control() Tejun Heo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

The three factored out css management operations -
cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() -
only depend on the current state of the target cgroups and idempotent
and thus can be easily made to operate on the subtree instead of the
immediate children.

This patch introduces the iterators which walk live subtree and
converts the three functions to operate on the subtree including self
instead of the children.  While this leads to spurious walking and be
slightly more expensive, it will allow them to be used for wider scope
of operations.

Note that cgroup_drain_offline() now tests for whether a css is dying
before trying to drain it.  This is to avoid trying to drain live
csses as there can be mix of live and dying csses in a subtree unlike
children of the same parent.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 41 +++++++++++++++++++++++++++++++----------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bcf0bad..07ea063 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -573,6 +573,24 @@ static int notify_on_release(const struct cgroup *cgrp)
 			;						\
 		else
 
+/* walk live descendants in preorder */
+#define cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp)		\
+	css_for_each_descendant_pre((d_css), cgroup_css((cgrp), NULL))	\
+		if (({ lockdep_assert_held(&cgroup_mutex);		\
+		       (dsct) = (d_css)->cgroup;			\
+		       cgroup_is_dead(dsct); }))			\
+			;						\
+		else
+
+/* walk live descendants in postorder */
+#define cgroup_for_each_live_descendant_post(dsct, d_css, cgrp)		\
+	css_for_each_descendant_post((d_css), cgroup_css((cgrp), NULL))	\
+		if (({ lockdep_assert_held(&cgroup_mutex);		\
+		       (dsct) = (d_css)->cgroup;			\
+		       cgroup_is_dead(dsct); }))			\
+			;						\
+		else
+
 static void cgroup_release_agent(struct work_struct *work);
 static void check_for_release(struct cgroup *cgrp);
 
@@ -2966,11 +2984,11 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 
 /**
  * cgroup_drain_offline - wait for previously offlined csses to go away
- * @cgrp: parent of the target cgroups
+ * @cgrp: root of the target subtree
  *
  * Because css offlining is asynchronous, userland may try to re-enable a
  * controller while the previous css is still around.  This function drains
- * the previous css instances of @cgrp's children.
+ * the previous css instances of @cgrp's subtree.
  *
  * Must be called with cgroup_mutex held.  Returns %false if there were no
  * dying css instances.  Returns %true if there were one or more and this
@@ -2981,17 +2999,18 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 static bool cgroup_drain_offline(struct cgroup *cgrp)
 {
 	struct cgroup *dsct;
+	struct cgroup_subsys_state *d_css;
 	struct cgroup_subsys *ss;
 	int ssid;
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	cgroup_for_each_live_child(dsct, cgrp) {
+	cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) {
 		for_each_subsys(ss, ssid) {
 			struct cgroup_subsys_state *css = cgroup_css(dsct, ss);
 			DEFINE_WAIT(wait);
 
-			if (!css)
+			if (!css || !percpu_ref_is_dying(&css->refcnt))
 				continue;
 
 			cgroup_get(dsct);
@@ -3013,9 +3032,9 @@ static bool cgroup_drain_offline(struct cgroup *cgrp)
 
 /**
  * cgroup_apply_control_enable - enable or show csses according to control
- * @cgrp: parent of the target cgroups
+ * @cgrp: root of the target subtree
  *
- * Walk @cgrp's children and create new csses or make the existing ones
+ * Walk @cgrp's subtree and create new csses or make the existing ones
  * visible.  A css is created invisible if it's being implicitly enabled
  * through dependency.  An invisible css is made visible when the userland
  * explicitly enables it.
@@ -3027,10 +3046,11 @@ static bool cgroup_drain_offline(struct cgroup *cgrp)
 static int cgroup_apply_control_enable(struct cgroup *cgrp)
 {
 	struct cgroup *dsct;
+	struct cgroup_subsys_state *d_css;
 	struct cgroup_subsys *ss;
 	int ssid, ret;
 
-	cgroup_for_each_live_child(dsct, cgrp) {
+	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
 		for_each_subsys(ss, ssid) {
 			struct cgroup_subsys_state *css = cgroup_css(dsct, ss);
 
@@ -3056,9 +3076,9 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp)
 
 /**
  * cgroup_apply_control_disable - kill or hide csses according to control
- * @cgrp: parent of the target cgroups
+ * @cgrp: root of the target subtree
  *
- * Walk @cgrp's children and kill and hide csses so that they match
+ * Walk @cgrp's subtree and kill and hide csses so that they match
  * cgroup_ss_mask() and cgroup_visible_mask().
  *
  * A css is hidden when the userland requests it to be disabled while other
@@ -3070,10 +3090,11 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp)
 static void cgroup_apply_control_disable(struct cgroup *cgrp)
 {
 	struct cgroup *dsct;
+	struct cgroup_subsys_state *d_css;
 	struct cgroup_subsys *ss;
 	int ssid;
 
-	cgroup_for_each_live_child(dsct, cgrp) {
+	cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) {
 		for_each_subsys(ss, ssid) {
 			struct cgroup_subsys_state *css = cgroup_css(dsct, ss);
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/16] cgroup: introduce cgroup_{save|propagate|restore}_control()
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
                   ` (6 preceding siblings ...)
  2016-02-24 22:02 ` [PATCH 09/16] cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive Tejun Heo
@ 2016-02-24 22:02 ` Tejun Heo
  2016-02-24 22:02 ` [PATCH 11/16] cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write() Tejun Heo
  2016-03-03 14:59 ` [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

While controllers are being enabled and disabled in
cgroup_subtree_control_write(), the original subsystem masks are
stashed in local variables so that they can be restored if the
operation fails in the middle.

This patch adds dedicated fields to struct cgroup to be used instead
of the local variables and implements functions to stash the current
values, propagate the changes and restore them recursively.  Combined
with the previous changes, this makes subsystem management operations
fully recursive and modularlized.  This will be used to expand cgroup
core functionalities.

While at it, remove now unused @css_enable and @css_disable from
cgroup_subtree_control_write().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/cgroup-defs.h |  2 ++
 kernel/cgroup.c             | 82 ++++++++++++++++++++++++++++++++++-----------
 2 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 7593c1a..aae8c94 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -260,6 +260,8 @@ struct cgroup {
 	 */
 	u16 subtree_control;
 	u16 subtree_ss_mask;
+	u16 old_subtree_control;
+	u16 old_subtree_ss_mask;
 
 	/* Private pointers for each registered subsystem */
 	struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 07ea063..2a55043 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3031,6 +3031,62 @@ static bool cgroup_drain_offline(struct cgroup *cgrp)
 }
 
 /**
+ * cgroup_save_control - save control masks of a subtree
+ * @cgrp: root of the target subtree
+ *
+ * Save ->subtree_control and ->subtree_ss_mask to the respective old_
+ * prefixed fields for @cgrp's subtree including @cgrp itself.
+ */
+static void cgroup_save_control(struct cgroup *cgrp)
+{
+	struct cgroup *dsct;
+	struct cgroup_subsys_state *d_css;
+
+	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
+		dsct->old_subtree_control = dsct->subtree_control;
+		dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
+	}
+}
+
+/**
+ * cgroup_propagate_control - refresh control masks of a subtree
+ * @cgrp: root of the target subtree
+ *
+ * For @cgrp and its subtree, ensure ->subtree_ss_mask matches
+ * ->subtree_control and propagate controller availability through the
+ * subtree so that descendants don't have unavailable controllers enabled.
+ */
+static void cgroup_propagate_control(struct cgroup *cgrp)
+{
+	struct cgroup *dsct;
+	struct cgroup_subsys_state *d_css;
+
+	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
+		dsct->subtree_control &= cgroup_control(dsct);
+		dsct->subtree_ss_mask = cgroup_calc_subtree_ss_mask(dsct,
+							dsct->subtree_control);
+	}
+}
+
+/**
+ * cgroup_restore_control - restore control masks of a subtree
+ * @cgrp: root of the target subtree
+ *
+ * Restore ->subtree_control and ->subtree_ss_mask from the respective old_
+ * prefixed fields for @cgrp's subtree including @cgrp itself.
+ */
+static void cgroup_restore_control(struct cgroup *cgrp)
+{
+	struct cgroup *dsct;
+	struct cgroup_subsys_state *d_css;
+
+	cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) {
+		dsct->subtree_control = dsct->old_subtree_control;
+		dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
+	}
+}
+
+/**
  * cgroup_apply_control_enable - enable or show csses according to control
  * @cgrp: root of the target subtree
  *
@@ -3118,7 +3174,6 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 					    loff_t off)
 {
 	u16 enable = 0, disable = 0;
-	u16 css_enable, css_disable, old_sc, new_sc, old_ss, new_ss;
 	struct cgroup *cgrp, *child;
 	struct cgroup_subsys *ss;
 	char *tok;
@@ -3202,25 +3257,14 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 		return restart_syscall();
 	}
 
-	/*
-	 * Update subsys masks and calculate what needs to be done.  More
-	 * subsystems than specified may need to be enabled or disabled
-	 * depending on subsystem dependencies.
-	 */
-	old_sc = cgrp->subtree_control;
-	old_ss = cgrp->subtree_ss_mask;
-	new_sc = (old_sc | enable) & ~disable;
-	new_ss = cgroup_calc_subtree_ss_mask(cgrp, new_sc);
+	/* save and update control masks and prepare csses */
+	cgroup_save_control(cgrp);
 
-	css_enable = ~old_ss & new_ss;
-	css_disable = old_ss & ~new_ss;
-	enable |= css_enable;
-	disable |= css_disable;
+	cgrp->subtree_control |= enable;
+	cgrp->subtree_control &= ~disable;
 
-	cgrp->subtree_control = new_sc;
-	cgrp->subtree_ss_mask = new_ss;
+	cgroup_propagate_control(cgrp);
 
-	/* prepare csses */
 	ret = cgroup_apply_control_enable(cgrp);
 	if (ret)
 		goto err_undo_css;
@@ -3245,9 +3289,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 
 err_undo_css:
 	/* restore masks and shoot down new csses */
-	cgrp->subtree_control = old_sc;
-	cgrp->subtree_ss_mask = old_ss;
-
+	cgroup_restore_control(cgrp);
 	cgroup_apply_control_disable(cgrp);
 
 	goto out_unlock;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/16] cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write()
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
                   ` (7 preceding siblings ...)
  2016-02-24 22:02 ` [PATCH 10/16] cgroup: introduce cgroup_{save|propagate|restore}_control() Tejun Heo
@ 2016-02-24 22:02 ` Tejun Heo
  2016-03-03 14:59 ` [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team, Tejun Heo

Factor out cgroup_{apply|finalize}_control() so that control mask
update can be done in several simple steps.  This patch doesn't
introduce behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup.c | 81 +++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 23 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2a55043..baa0e05 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3168,6 +3168,62 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
 	}
 }
 
+/**
+ * cgroup_apply_control - apply control mask updates to the subtree
+ * @cgrp: root of the target subtree
+ *
+ * subsystems can be enabled and disabled in a subtree using the following
+ * steps.
+ *
+ * 1. Call cgroup_save_control() to stash the current state.
+ * 2. Update ->subtree_control masks in the subtree as desired.
+ * 3. Call cgroup_apply_control() to apply the changes.
+ * 4. Optionally perform other related operations.
+ * 5. Call cgroup_finalize_control() to finish up.
+ *
+ * This function implements step 3 and propagates the mask changes
+ * throughout @cgrp's subtree, updates csses accordingly and perform
+ * process migrations.
+ */
+static int cgroup_apply_control(struct cgroup *cgrp)
+{
+	int ret;
+
+	cgroup_propagate_control(cgrp);
+
+	ret = cgroup_apply_control_enable(cgrp);
+	if (ret)
+		return ret;
+
+	/*
+	 * At this point, cgroup_e_css() results reflect the new csses
+	 * making the following cgroup_update_dfl_csses() properly update
+	 * css associations of all tasks in the subtree.
+	 */
+	ret = cgroup_update_dfl_csses(cgrp);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+/**
+ * cgroup_finalize_control - finalize control mask update
+ * @cgrp: root of the target subtree
+ * @ret: the result of the update
+ *
+ * Finalize control mask update.  See cgroup_apply_control() for more info.
+ */
+static void cgroup_finalize_control(struct cgroup *cgrp, int ret)
+{
+	if (ret) {
+		cgroup_restore_control(cgrp);
+		cgroup_propagate_control(cgrp);
+	}
+
+	cgroup_apply_control_disable(cgrp);
+}
+
 /* change the enabled child controllers for a cgroup in the default hierarchy */
 static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 					    char *buf, size_t nbytes,
@@ -3263,36 +3319,15 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 	cgrp->subtree_control |= enable;
 	cgrp->subtree_control &= ~disable;
 
-	cgroup_propagate_control(cgrp);
+	ret = cgroup_apply_control(cgrp);
 
-	ret = cgroup_apply_control_enable(cgrp);
-	if (ret)
-		goto err_undo_css;
-
-	/*
-	 * At this point, cgroup_e_css() results reflect the new csses
-	 * making the following cgroup_update_dfl_csses() properly update
-	 * css associations of all tasks in the subtree.
-	 */
-	ret = cgroup_update_dfl_csses(cgrp);
-	if (ret)
-		goto err_undo_css;
-
-	/* all tasks are migrated out of disabled csses, commit disable */
-	cgroup_apply_control_disable(cgrp);
+	cgroup_finalize_control(cgrp, ret);
 
 	kernfs_activate(cgrp->kn);
 	ret = 0;
 out_unlock:
 	cgroup_kn_unlock(of->kn);
 	return ret ?: nbytes;
-
-err_undo_css:
-	/* restore masks and shoot down new csses */
-	cgroup_restore_control(cgrp);
-	cgroup_apply_control_disable(cgrp);
-
-	goto out_unlock;
 }
 
 static int cgroup_events_show(struct seq_file *seq, void *v)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/16] cgroup: combine cgroup_mutex locking and offline css draining
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-02-24 22:02   ` [PATCH 08/16] cgroup: factor out cgroup_apply_control_enable() " Tejun Heo
@ 2016-02-24 22:02   ` Tejun Heo
  2016-02-24 22:02   ` [PATCH 13/16] cgroup: use cgroup_apply_enable_control() in cgroup creation path Tejun Heo
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

cgroup_drain_offline() is used to wait for csses being offlined to
uninstall itself from cgroup->subsys[] array so that new csses can be
installed.  The function's only user, cgroup_subtree_control_write(),
calls it after performing some checks and restarts the whole process
via restart_syscall() if draining has to release cgroup_mutex to wait.

This can be simplified by draining before other synchronized
operations so that there's nothing to restart.  This patch converts
cgroup_drain_offline() to cgroup_lock_and_drain_offline() which
performs both locking and draining and updates cgroup_kn_lock_live()
use it instead of cgroup_mutex() if requested.  This combined locking
and draining operations are easier to use and less error-prone.

While at it, add WARNs in control_apply functions which triggers if
the subtree isn't properly drained.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 55 +++++++++++++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index baa0e05..b1c6fe2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -220,6 +220,7 @@ static struct cftype cgroup_dfl_base_files[];
 static struct cftype cgroup_legacy_base_files[];
 
 static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
+static void cgroup_lock_and_drain_offline(struct cgroup *cgrp);
 static void css_task_iter_advance(struct css_task_iter *it);
 static int cgroup_destroy_locked(struct cgroup *cgrp);
 static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
@@ -1390,19 +1391,22 @@ static void cgroup_kn_unlock(struct kernfs_node *kn)
 /**
  * cgroup_kn_lock_live - locking helper for cgroup kernfs methods
  * @kn: the kernfs_node being serviced
+ * @drain_offline: perform offline draining on the cgroup
  *
  * This helper is to be used by a cgroup kernfs method currently servicing
  * @kn.  It breaks the active protection, performs cgroup locking and
  * verifies that the associated cgroup is alive.  Returns the cgroup if
  * alive; otherwise, %NULL.  A successful return should be undone by a
- * matching cgroup_kn_unlock() invocation.
+ * matching cgroup_kn_unlock() invocation.  If @drain_offline is %true, the
+ * cgroup is drained of offlining csses before return.
  *
  * Any cgroup kernfs method implementation which requires locking the
  * associated cgroup should use this helper.  It avoids nesting cgroup
  * locking under kernfs active protection and allows all kernfs operations
  * including self-removal.
  */
-static struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn)
+static struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn,
+					  bool drain_offline)
 {
 	struct cgroup *cgrp;
 
@@ -1421,7 +1425,10 @@ static struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn)
 		return NULL;
 	kernfs_break_active_protection(kn);
 
-	mutex_lock(&cgroup_mutex);
+	if (drain_offline)
+		cgroup_lock_and_drain_offline(cgrp);
+	else
+		mutex_lock(&cgroup_mutex);
 
 	if (!cgroup_is_dead(cgrp))
 		return cgrp;
@@ -2760,7 +2767,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
 		return -EINVAL;
 
-	cgrp = cgroup_kn_lock_live(of->kn);
+	cgrp = cgroup_kn_lock_live(of->kn, false);
 	if (!cgrp)
 		return -ENODEV;
 
@@ -2858,7 +2865,7 @@ static ssize_t cgroup_release_agent_write(struct kernfs_open_file *of,
 
 	BUILD_BUG_ON(sizeof(cgrp->root->release_agent_path) < PATH_MAX);
 
-	cgrp = cgroup_kn_lock_live(of->kn);
+	cgrp = cgroup_kn_lock_live(of->kn, false);
 	if (!cgrp)
 		return -ENODEV;
 	spin_lock(&release_agent_path_lock);
@@ -2983,27 +2990,23 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 }
 
 /**
- * cgroup_drain_offline - wait for previously offlined csses to go away
+ * cgroup_lock_and_drain_offline - lock cgroup_mutex and drain offlined csses
  * @cgrp: root of the target subtree
  *
  * Because css offlining is asynchronous, userland may try to re-enable a
- * controller while the previous css is still around.  This function drains
- * the previous css instances of @cgrp's subtree.
- *
- * Must be called with cgroup_mutex held.  Returns %false if there were no
- * dying css instances.  Returns %true if there were one or more and this
- * function waited.  On %true return, cgroup_mutex has been dropped and
- * re-acquired inbetween which anything could have happened.  The caller
- * typically would have to start over.
+ * controller while the previous css is still around.  This function grabs
+ * cgroup_mutex and drains the previous css instances of @cgrp's subtree.
  */
-static bool cgroup_drain_offline(struct cgroup *cgrp)
+static void cgroup_lock_and_drain_offline(struct cgroup *cgrp)
+	__acquires(&cgroup_mutex)
 {
 	struct cgroup *dsct;
 	struct cgroup_subsys_state *d_css;
 	struct cgroup_subsys *ss;
 	int ssid;
 
-	lockdep_assert_held(&cgroup_mutex);
+restart:
+	mutex_lock(&cgroup_mutex);
 
 	cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) {
 		for_each_subsys(ss, ssid) {
@@ -3020,14 +3023,11 @@ static bool cgroup_drain_offline(struct cgroup *cgrp)
 			mutex_unlock(&cgroup_mutex);
 			schedule();
 			finish_wait(&dsct->offline_waitq, &wait);
-			mutex_lock(&cgroup_mutex);
 
 			cgroup_put(dsct);
-			return true;
+			goto restart;
 		}
 	}
-
-	return false;
 }
 
 /**
@@ -3110,6 +3110,8 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp)
 		for_each_subsys(ss, ssid) {
 			struct cgroup_subsys_state *css = cgroup_css(dsct, ss);
 
+			WARN_ON_ONCE(css && percpu_ref_is_dying(&css->refcnt));
+
 			if (!(cgroup_ss_mask(dsct) & (1 << ss->id)))
 				continue;
 
@@ -3154,6 +3156,8 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
 		for_each_subsys(ss, ssid) {
 			struct cgroup_subsys_state *css = cgroup_css(dsct, ss);
 
+			WARN_ON_ONCE(css && percpu_ref_is_dying(&css->refcnt));
+
 			if (!css)
 				continue;
 
@@ -3263,7 +3267,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 			return -EINVAL;
 	}
 
-	cgrp = cgroup_kn_lock_live(of->kn);
+	cgrp = cgroup_kn_lock_live(of->kn, true);
 	if (!cgrp)
 		return -ENODEV;
 
@@ -3308,11 +3312,6 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
 		goto out_unlock;
 	}
 
-	if (cgroup_drain_offline(cgrp)) {
-		cgroup_kn_unlock(of->kn);
-		return restart_syscall();
-	}
-
 	/* save and update control masks and prepare csses */
 	cgroup_save_control(cgrp);
 
@@ -5136,7 +5135,7 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	if (strchr(name, '\n'))
 		return -EINVAL;
 
-	parent = cgroup_kn_lock_live(parent_kn);
+	parent = cgroup_kn_lock_live(parent_kn, false);
 	if (!parent)
 		return -ENODEV;
 
@@ -5335,7 +5334,7 @@ static int cgroup_rmdir(struct kernfs_node *kn)
 	struct cgroup *cgrp;
 	int ret = 0;
 
-	cgrp = cgroup_kn_lock_live(kn);
+	cgrp = cgroup_kn_lock_live(kn, false);
 	if (!cgrp)
 		return 0;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 13/16] cgroup: use cgroup_apply_enable_control() in cgroup creation path
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-02-24 22:02   ` [PATCH 12/16] cgroup: combine cgroup_mutex locking and offline css draining Tejun Heo
@ 2016-02-24 22:02   ` Tejun Heo
  2016-02-24 22:02   ` [PATCH 14/16] cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends Tejun Heo
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

cgroup_create() manually updates control masks and creates child csses
which cgroup_mkdir() then manually populates.  Both can be simplified
by using cgroup_apply_enable_control() and friends.  The only catch is
that it calls css_populate_dir() with NULL cgroup->kn during
cgroup_create().  This is worked around by making the function noop on
NULL kn.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 38 ++++++++++++++------------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b1c6fe2..efeaa54 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1489,7 +1489,7 @@ static int css_populate_dir(struct cgroup_subsys_state *css,
 	struct cftype *cfts, *failed_cfts;
 	int ret;
 
-	if (css->flags & CSS_VISIBLE)
+	if ((css->flags & CSS_VISIBLE) || !cgrp->kn)
 		return 0;
 
 	if (!css->ss) {
@@ -5038,10 +5038,9 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
 static struct cgroup *cgroup_create(struct cgroup *parent)
 {
 	struct cgroup_root *root = parent->root;
-	struct cgroup_subsys *ss;
 	struct cgroup *cgrp, *tcgrp;
 	int level = parent->level + 1;
-	int ssid, ret;
+	int ret;
 
 	/* allocate the cgroup and its ID, 0 is reserved for the root */
 	cgrp = kzalloc(sizeof(*cgrp) +
@@ -5091,25 +5090,19 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	 */
 	cgroup_idr_replace(&root->cgroup_idr, cgrp, cgrp->id);
 
-	/* create the csses */
-	do_each_subsys_mask(ss, ssid, cgroup_ss_mask(cgrp)) {
-		struct cgroup_subsys_state *css;
-
-		css = css_create(cgrp, ss);
-		if (IS_ERR(css)) {
-			ret = PTR_ERR(css);
-			goto out_destroy;
-		}
-	} while_each_subsys_mask();
-
 	/*
 	 * On the default hierarchy, a child doesn't automatically inherit
 	 * subtree_control from the parent.  Each is configured manually.
 	 */
-	if (!cgroup_on_dfl(cgrp)) {
+	if (!cgroup_on_dfl(cgrp))
 		cgrp->subtree_control = cgroup_control(cgrp);
-		cgroup_refresh_subtree_ss_mask(cgrp);
-	}
+
+	cgroup_propagate_control(cgrp);
+
+	/* @cgrp doesn't have dir yet so the following will only create csses */
+	ret = cgroup_apply_control_enable(cgrp);
+	if (ret)
+		goto out_destroy;
 
 	return cgrp;
 
@@ -5127,9 +5120,8 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 			umode_t mode)
 {
 	struct cgroup *parent, *cgrp;
-	struct cgroup_subsys *ss;
 	struct kernfs_node *kn;
-	int ssid, ret;
+	int ret;
 
 	/* do not accept '\n' to prevent making /proc/<pid>/cgroup unparsable */
 	if (strchr(name, '\n'))
@@ -5167,11 +5159,9 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	if (ret)
 		goto out_destroy;
 
-	do_each_subsys_mask(ss, ssid, cgroup_control(cgrp)) {
-		ret = css_populate_dir(cgroup_css(cgrp, ss), NULL);
-		if (ret)
-			goto out_destroy;
-	} while_each_subsys_mask();
+	ret = cgroup_apply_control_enable(cgrp);
+	if (ret)
+		goto out_destroy;
 
 	/* let's create and online css's */
 	kernfs_activate(kn);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 14/16] cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-02-24 22:02   ` [PATCH 13/16] cgroup: use cgroup_apply_enable_control() in cgroup creation path Tejun Heo
@ 2016-02-24 22:02   ` Tejun Heo
  2016-02-24 22:02   ` [PATCH 15/16] cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask Tejun Heo
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

rebind_subsystem() open codes quite a bit of css and interface file
manipulations.  It tries to be fail-safe but doesn't quite achieve it.
It can be greatly simplified by using the new css management helpers.
This patch reimplements rebind_subsytsems() using
cgroup_apply_control() and friends.

* The half-baked rollback on file creation failure is dropped.  It is
  an extremely cold path, failure isn't critical, and, aside from
  kernel bugs, the only reason it can fail is memory allocation
  failure which pretty much doesn't happen for small allocations.

* As cgroup_apply_control_disable() is now used to clean up root
  cgroup on rebind, make sure that it doesn't end up killing root
  csses.

* All callers of rebind_subsystems() are updated to use
  cgroup_lock_and_drain_offline() as the apply_control functions
  require drained subtree.

* This leaves cgroup_refresh_subtree_ss_mask() without any user.
  Removed.

* css_populate_dir() and css_clear_dir() no longer needs
  @cgrp_override parameter.  Dropped.

* While at it, add WARN_ON() to rebind_subsystem() calls which are
  expected to always succeed just in case.

While the rules visible to userland aren't changed, this
reimplementation not only simplifies rebind_subsystems() but also
allows it to disable and enable csses recursively.  This can be used
to implement more flexible rebinding.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 107 +++++++++++++++-----------------------------------------
 1 file changed, 28 insertions(+), 79 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index efeaa54..01792a5 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -221,6 +221,8 @@ static struct cftype cgroup_legacy_base_files[];
 
 static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
 static void cgroup_lock_and_drain_offline(struct cgroup *cgrp);
+static int cgroup_apply_control(struct cgroup *cgrp);
+static void cgroup_finalize_control(struct cgroup *cgrp, int ret);
 static void css_task_iter_advance(struct css_task_iter *it);
 static int cgroup_destroy_locked(struct cgroup *cgrp);
 static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
@@ -1159,13 +1161,13 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 	struct cgroup *cgrp = &root->cgrp;
 	struct cgrp_cset_link *link, *tmp_link;
 
-	mutex_lock(&cgroup_mutex);
+	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
 	BUG_ON(atomic_read(&root->nr_cgrps));
 	BUG_ON(!list_empty(&cgrp->self.children));
 
 	/* Rebind all subsystems back to the default hierarchy */
-	rebind_subsystems(&cgrp_dfl_root, root->subsys_mask);
+	WARN_ON(rebind_subsystems(&cgrp_dfl_root, root->subsys_mask));
 
 	/*
 	 * Release all the links from cset_links to this hierarchy's
@@ -1351,19 +1353,6 @@ static u16 cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, u16 subtree_control)
 }
 
 /**
- * cgroup_refresh_subtree_ss_mask - update subtree_ss_mask
- * @cgrp: the target cgroup
- *
- * Update @cgrp->subtree_ss_mask according to the current
- * @cgrp->subtree_control using cgroup_calc_subtree_ss_mask().
- */
-static void cgroup_refresh_subtree_ss_mask(struct cgroup *cgrp)
-{
-	cgrp->subtree_ss_mask =
-		cgroup_calc_subtree_ss_mask(cgrp, cgrp->subtree_control);
-}
-
-/**
  * cgroup_kn_unlock - unlocking helper for cgroup kernfs methods
  * @kn: the kernfs_node being serviced
  *
@@ -1458,12 +1447,10 @@ static void cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft)
 /**
  * css_clear_dir - remove subsys files in a cgroup directory
  * @css: taget css
- * @cgrp_override: specify if target cgroup is different from css->cgroup
  */
-static void css_clear_dir(struct cgroup_subsys_state *css,
-			  struct cgroup *cgrp_override)
+static void css_clear_dir(struct cgroup_subsys_state *css)
 {
-	struct cgroup *cgrp = cgrp_override ?: css->cgroup;
+	struct cgroup *cgrp = css->cgroup;
 	struct cftype *cfts;
 
 	if (!(css->flags & CSS_VISIBLE))
@@ -1478,14 +1465,12 @@ static void css_clear_dir(struct cgroup_subsys_state *css,
 /**
  * css_populate_dir - create subsys files in a cgroup directory
  * @css: target css
- * @cgrp_overried: specify if target cgroup is different from css->cgroup
  *
  * On failure, no file is added.
  */
-static int css_populate_dir(struct cgroup_subsys_state *css,
-			    struct cgroup *cgrp_override)
+static int css_populate_dir(struct cgroup_subsys_state *css)
 {
-	struct cgroup *cgrp = cgrp_override ?: css->cgroup;
+	struct cgroup *cgrp = css->cgroup;
 	struct cftype *cfts, *failed_cfts;
 	int ret;
 
@@ -1525,7 +1510,6 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
 {
 	struct cgroup *dcgrp = &dst_root->cgrp;
 	struct cgroup_subsys *ss;
-	u16 tmp_ss_mask;
 	int ssid, i, ret;
 
 	lockdep_assert_held(&cgroup_mutex);
@@ -1540,46 +1524,6 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
 			return -EBUSY;
 	} while_each_subsys_mask();
 
-	/* skip creating root files on dfl_root for inhibited subsystems */
-	tmp_ss_mask = ss_mask;
-	if (dst_root == &cgrp_dfl_root)
-		tmp_ss_mask &= ~cgrp_dfl_inhibit_ss_mask;
-
-	do_each_subsys_mask(ss, ssid, tmp_ss_mask) {
-		struct cgroup *scgrp = &ss->root->cgrp;
-		int tssid;
-
-		ret = css_populate_dir(cgroup_css(scgrp, ss), dcgrp);
-		if (!ret)
-			continue;
-
-		/*
-		 * Rebinding back to the default root is not allowed to
-		 * fail.  Using both default and non-default roots should
-		 * be rare.  Moving subsystems back and forth even more so.
-		 * Just warn about it and continue.
-		 */
-		if (dst_root == &cgrp_dfl_root) {
-			if (cgrp_dfl_visible) {
-				pr_warn("failed to create files (%d) while rebinding 0x%x to default root\n",
-					ret, ss_mask);
-				pr_warn("you may retry by moving them to a different hierarchy and unbinding\n");
-			}
-			continue;
-		}
-
-		do_each_subsys_mask(ss, tssid, tmp_ss_mask) {
-			if (tssid == ssid)
-				break;
-			css_clear_dir(cgroup_css(scgrp, ss), dcgrp);
-		} while_each_subsys_mask();
-		return ret;
-	} while_each_subsys_mask();
-
-	/*
-	 * Nothing can fail from this point on.  Remove files for the
-	 * removed subsystems and rebind each subsystem.
-	 */
 	do_each_subsys_mask(ss, ssid, ss_mask) {
 		struct cgroup_root *src_root = ss->root;
 		struct cgroup *scgrp = &src_root->cgrp;
@@ -1588,8 +1532,12 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
 
 		WARN_ON(!css || cgroup_css(dcgrp, ss));
 
-		css_clear_dir(css, NULL);
+		/* disable from the source */
+		src_root->subsys_mask &= ~(1 << ssid);
+		WARN_ON(cgroup_apply_control(scgrp));
+		cgroup_finalize_control(scgrp, 0);
 
+		/* rebind */
 		RCU_INIT_POINTER(scgrp->subsys[ssid], NULL);
 		rcu_assign_pointer(dcgrp->subsys[ssid], css);
 		ss->root = dst_root;
@@ -1601,20 +1549,20 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
 				       &dcgrp->e_csets[ss->id]);
 		spin_unlock_bh(&css_set_lock);
 
-		src_root->subsys_mask &= ~(1 << ssid);
-		scgrp->subtree_control &= ~(1 << ssid);
-		cgroup_refresh_subtree_ss_mask(scgrp);
-
 		/* default hierarchy doesn't enable controllers by default */
 		dst_root->subsys_mask |= 1 << ssid;
 		if (dst_root == &cgrp_dfl_root) {
 			static_branch_enable(cgroup_subsys_on_dfl_key[ssid]);
 		} else {
 			dcgrp->subtree_control |= 1 << ssid;
-			cgroup_refresh_subtree_ss_mask(dcgrp);
 			static_branch_disable(cgroup_subsys_on_dfl_key[ssid]);
 		}
 
+		ret = cgroup_apply_control(dcgrp);
+		if (ret)
+			pr_warn("partial failure to rebind %s controller (err=%d)\n",
+				ss->name, ret);
+
 		if (ss->bind)
 			ss->bind(css);
 	} while_each_subsys_mask();
@@ -1806,7 +1754,7 @@ static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data)
 		return -EINVAL;
 	}
 
-	mutex_lock(&cgroup_mutex);
+	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
 	/* See what subsystems are wanted */
 	ret = parse_cgroupfs_options(data, &opts);
@@ -1839,7 +1787,7 @@ static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data)
 	if (ret)
 		goto out_unlock;
 
-	rebind_subsystems(&cgrp_dfl_root, removed_mask);
+	WARN_ON(rebind_subsystems(&cgrp_dfl_root, removed_mask));
 
 	if (opts.release_agent) {
 		spin_lock(&release_agent_path_lock);
@@ -1990,7 +1938,7 @@ static int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 	}
 	root_cgrp->kn = root->kf_root->kn;
 
-	ret = css_populate_dir(&root_cgrp->self, NULL);
+	ret = css_populate_dir(&root_cgrp->self);
 	if (ret)
 		goto destroy_root;
 
@@ -2069,7 +2017,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		goto out_mount;
 	}
 
-	mutex_lock(&cgroup_mutex);
+	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
 	/* First find the desired set of subsystems */
 	ret = parse_cgroupfs_options(data, &opts);
@@ -3122,7 +3070,7 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp)
 			}
 
 			if (cgroup_control(dsct) & (1 << ss->id)) {
-				ret = css_populate_dir(css, NULL);
+				ret = css_populate_dir(css);
 				if (ret)
 					return ret;
 			}
@@ -3161,10 +3109,11 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
 			if (!css)
 				continue;
 
-			if (!(cgroup_ss_mask(dsct) & (1 << ss->id))) {
+			if (css->parent &&
+			    !(cgroup_ss_mask(dsct) & (1 << ss->id))) {
 				kill_css(css);
 			} else if (!(cgroup_control(dsct) & (1 << ss->id))) {
-				css_clear_dir(css, NULL);
+				css_clear_dir(css);
 				if (ss->css_reset)
 					ss->css_reset(css);
 			}
@@ -5155,7 +5104,7 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	if (ret)
 		goto out_destroy;
 
-	ret = css_populate_dir(&cgrp->self, NULL);
+	ret = css_populate_dir(&cgrp->self);
 	if (ret)
 		goto out_destroy;
 
@@ -5227,7 +5176,7 @@ static void kill_css(struct cgroup_subsys_state *css)
 	 * This must happen before css is disassociated with its cgroup.
 	 * See seq_css() for details.
 	 */
-	css_clear_dir(css, NULL);
+	css_clear_dir(css);
 
 	/*
 	 * Killing would put the base ref, but we need to keep it alive
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 15/16] cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-02-24 22:02   ` [PATCH 14/16] cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends Tejun Heo
@ 2016-02-24 22:02   ` Tejun Heo
  2016-02-24 22:02   ` [PATCH 16/16] cgroup: allocate 2x cgrp_cset_links when setting up a new root Tejun Heo
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

cgroup_calc_subtree_ss_mask() currently takes @cgrp and
@subtree_control.  @cgrp is used for two purposes - to decide whether
it's for default hierarchy and the mask of available subsystems.  The
former doesn't matter as the results are the same regardless.  The
latter can be specified directly through a subsystem mask.

This patch makes cgroup_calc_subtree_ss_mask() perform the same
calculations for both default and legacy hierarchies and take
@this_ss_mask for available subsystems.  @cgrp is no longer used and
dropped.  This is to allow using the function in contexts where
available controllers can't be decided from the cgroup.

v2: cgroup_refres_subtree_ss_mask() is removed by a previous patch.
    Updated accordingly.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 01792a5..744e1c6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1308,18 +1308,17 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
 
 /**
  * cgroup_calc_subtree_ss_mask - calculate subtree_ss_mask
- * @cgrp: the target cgroup
  * @subtree_control: the new subtree_control mask to consider
+ * @this_ss_mask: available subsystems
  *
  * On the default hierarchy, a subsystem may request other subsystems to be
  * enabled together through its ->depends_on mask.  In such cases, more
  * subsystems than specified in "cgroup.subtree_control" may be enabled.
  *
  * This function calculates which subsystems need to be enabled if
- * @subtree_control is to be applied to @cgrp.  The returned mask is always
- * a superset of @subtree_control and follows the usual hierarchy rules.
+ * @subtree_control is to be applied while restricted to @this_ss_mask.
  */
-static u16 cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, u16 subtree_control)
+static u16 cgroup_calc_subtree_ss_mask(u16 subtree_control, u16 this_ss_mask)
 {
 	u16 cur_ss_mask = subtree_control;
 	struct cgroup_subsys *ss;
@@ -1327,9 +1326,6 @@ static u16 cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, u16 subtree_control)
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	if (!cgroup_on_dfl(cgrp))
-		return cur_ss_mask;
-
 	while (true) {
 		u16 new_ss_mask = cur_ss_mask;
 
@@ -1342,7 +1338,7 @@ static u16 cgroup_calc_subtree_ss_mask(struct cgroup *cgrp, u16 subtree_control)
 		 * happen only if some depended-upon subsystems were bound
 		 * to non-default hierarchies.
 		 */
-		new_ss_mask &= cgroup_ss_mask(cgrp);
+		new_ss_mask &= this_ss_mask;
 
 		if (new_ss_mask == cur_ss_mask)
 			break;
@@ -3011,8 +3007,9 @@ static void cgroup_propagate_control(struct cgroup *cgrp)
 
 	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
 		dsct->subtree_control &= cgroup_control(dsct);
-		dsct->subtree_ss_mask = cgroup_calc_subtree_ss_mask(dsct,
-							dsct->subtree_control);
+		dsct->subtree_ss_mask =
+			cgroup_calc_subtree_ss_mask(dsct->subtree_control,
+						    cgroup_ss_mask(dsct));
 	}
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 16/16] cgroup: allocate 2x cgrp_cset_links when setting up a new root
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-02-24 22:02   ` [PATCH 15/16] cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask Tejun Heo
@ 2016-02-24 22:02   ` Tejun Heo
  2016-03-02 18:14   ` [PATCH 17/17] cgroup: update css iteration in cgroup_update_dfl_csses() Tejun Heo
  2016-03-03  3:02   ` [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Zefan Li
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-02-24 22:02 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	Tejun Heo

During prep, cgroup_setup_root() allocates cgrp_cset_links matching
the number of existing css_sets to later link the new root.  This is
fine for now as the only operation which can happen inbetween is
rebind_subsystems() and rebinding of empty subsystems doesn't create
new css_sets.

However, while not yet allowed, with the recent reimplementation,
rebind_subsystems() can rebind subsystems with descendant csses and
thus can create new css_sets.  This patch makes cgroup_setup_root()
allocate 2x of the existing css_sets so that later use of live
subsystem rebinding doesn't blow up.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 744e1c6..a44c123 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1914,10 +1914,11 @@ static int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 	/*
 	 * We're accessing css_set_count without locking css_set_lock here,
 	 * but that's OK - it can only be increased by someone holding
-	 * cgroup_lock, and that's us. The worst that can happen is that we
-	 * have some link structures left over
+	 * cgroup_lock, and that's us.  Later rebinding may disable
+	 * controllers on the default hierarchy and thus create new csets,
+	 * which can't be more than the existing ones.  Allocate 2x.
 	 */
-	ret = allocate_cgrp_cset_links(css_set_count, &tmp_links);
+	ret = allocate_cgrp_cset_links(2 * css_set_count, &tmp_links);
 	if (ret)
 		goto cancel_ref;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 17/17] cgroup: update css iteration in cgroup_update_dfl_csses()
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2016-02-24 22:02   ` [PATCH 16/16] cgroup: allocate 2x cgrp_cset_links when setting up a new root Tejun Heo
@ 2016-03-02 18:14   ` Tejun Heo
  2016-03-03  3:02   ` [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Zefan Li
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-03-02 18:14 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

From 9b9af540f2d55d80b4900ff94ca3d2c2980002bf Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Wed, 2 Mar 2016 13:10:41 -0500

The existing sequences of operations ensure that the offlining csses
are drained before cgroup_update_dfl_csses(), so even though
cgroup_update_dfl_csses() uses css_for_each_descendant_pre() to walk
the target cgroups, it doesn't end up operating on dead cgroups.
Also, the function explicitly excludes the subtree root from
operation.

This is fragile and inconsistent with the rest of css update
operations.  This patch updates cgroup_update_dfl_csses() to use
cgroup_for_each_live_descendant_pre() instead and include the subtree
root.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
Hello,

One extra patch in the series.  The patchset has been rebased on top
of the following two patches without any content change.

 http://lkml.kernel.org/g/20160302180712.GA11029-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org
 http://lkml.kernel.org/g/20160302180756.GB11029-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org

The git branch has been updated accordingly.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-recursive-control

Thanks.

 kernel/cgroup.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 40ed329..c63fce0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2877,16 +2877,17 @@ static int cgroup_subtree_control_show(struct seq_file *seq, void *v)
  * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy
  * @cgrp: root of the subtree to update csses for
  *
- * @cgrp's subtree_ss_mask has changed and its subtree's (self excluded)
- * css associations need to be updated accordingly.  This function looks up
- * all css_sets which are attached to the subtree, creates the matching
- * updated css_sets and migrates the tasks to the new ones.
+ * @cgrp's control masks have changed and its subtree's css associations
+ * need to be updated accordingly.  This function looks up all css_sets
+ * which are attached to the subtree, creates the matching updated css_sets
+ * and migrates the tasks to the new ones.
  */
 static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 {
 	LIST_HEAD(preloaded_csets);
 	struct cgroup_taskset tset = CGROUP_TASKSET_INIT(tset);
-	struct cgroup_subsys_state *css;
+	struct cgroup_subsys_state *d_css;
+	struct cgroup *dsct;
 	struct css_set *src_cset;
 	int ret;
 
@@ -2896,14 +2897,10 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 
 	/* look up all csses currently attached to @cgrp's subtree */
 	spin_lock_bh(&css_set_lock);
-	css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) {
+	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
 		struct cgrp_cset_link *link;
 
-		/* self is not affected by subtree_ss_mask change */
-		if (css->cgroup == cgrp)
-			continue;
-
-		list_for_each_entry(link, &css->cgroup->cset_links, cset_link)
+		list_for_each_entry(link, &dsct->cset_links, cset_link)
 			cgroup_migrate_add_src(link->cset, cgrp,
 					       &preloaded_csets);
 	}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive
       [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (8 preceding siblings ...)
  2016-03-02 18:14   ` [PATCH 17/17] cgroup: update css iteration in cgroup_update_dfl_csses() Tejun Heo
@ 2016-03-03  3:02   ` Zefan Li
  9 siblings, 0 replies; 20+ messages in thread
From: Zefan Li @ 2016-03-03  3:02 UTC (permalink / raw)
  To: Tejun Heo, hannes-druUgvl0LCNAfugRpC6u6w
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On 2016/2/25 6:02, Tejun Heo wrote:
> Hello,
> 
> Changes from v1[1] are
> 
> * Patches 0012-0014 and 0016 added.  They further refactor the
>   operations and convert existing css management paths to use the new
>   modular operations.
> 
> * 0005 updated to yield the correct result on v1 hierarchies too.
> 
> * 0015 updated to reflect earlier changes.
> 
> Currently, subsystem enabling and disabling in the default hierarchy
> are implemented as a long chain of interdependent operations in
> cgroup_subtree_control_write().  The function calculates what need to
> be done to its children and excute the necessary operations.  The
> function unfortunately ends up being a rather opaque blob of
> operations which is difficult to wrap one's head around or reuse.
> 
> This patchset restructures control mask update so that it's composed
> of discrete idempotent and recursive operations and convert existing
> css managment paths to use them which makes them simpler, more robust
> and capable of recursive operations.
> 
> This patchset includes the following 12 patches.
> 
>  0001-cgroup-separate-out-interface-file-creation-from-css.patch
>  0002-cgroup-explicitly-track-whether-a-cgroup_subsys_stat.patch
>  0003-cgroup-reorder-operations-in-cgroup_mkdir.patch
>  0004-cgroup-factor-out-cgroup_create-out-of-cgroup_mkdir.patch
>  0005-cgroup-introduce-cgroup_control-and-cgroup_ss_mask.patch
>  0006-cgroup-factor-out-cgroup_drain_offline-from-cgroup_s.patch
>  0007-cgroup-factor-out-cgroup_apply_control_disable-from-.patch
>  0008-cgroup-factor-out-cgroup_apply_control_enable-from-c.patch
>  0009-cgroup-make-cgroup_drain_offline-and-cgroup_apply_co.patch
>  0010-cgroup-introduce-cgroup_-save-propagate-restore-_con.patch
>  0011-cgroup-factor-out-cgroup_-apply-finalize-_control-fr.patch
>  0012-cgroup-combine-cgroup_mutex-locking-and-offline-css-.patch
>  0013-cgroup-use-cgroup_apply_enable_control-in-cgroup-cre.patch
>  0014-cgroup-reimplement-rebind_subsystems-using-cgroup_ap.patch
>  0015-cgroup-make-cgroup_calc_subtree_ss_mask-take-this_ss.patch
>  0016-cgroup-allocate-2x-cgrp_cset_links-when-setting-up-a.patch
> 
> The patchset is available in the following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-recursive-control
> 
> diffstat follows.  Thanks.
> 
>  include/linux/cgroup-defs.h |    3 
>  kernel/cgroup.c             |  760 +++++++++++++++++++++++++-------------------
>  2 files changed, 446 insertions(+), 317 deletions(-)
> 

Acked-by: Zefan Li <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive
  2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
                   ` (8 preceding siblings ...)
  2016-02-24 22:02 ` [PATCH 11/16] cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write() Tejun Heo
@ 2016-03-03 14:59 ` Tejun Heo
  9 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2016-03-03 14:59 UTC (permalink / raw)
  To: lizefan, hannes; +Cc: cgroups, linux-kernel, kernel-team

Applied to cgroup/for-4.6.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-03-03 14:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-24 22:02 [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo
2016-02-24 22:02 ` [PATCH 01/16] cgroup: separate out interface file creation from css creation Tejun Heo
2016-02-24 22:02 ` [PATCH 02/16] cgroup: explicitly track whether a cgroup_subsys_state is visible to userland Tejun Heo
     [not found] ` <1456351368-786-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2016-02-24 22:02   ` [PATCH 03/16] cgroup: reorder operations in cgroup_mkdir() Tejun Heo
2016-02-24 22:02   ` [PATCH 07/16] cgroup: factor out cgroup_apply_control_disable() from cgroup_subtree_control_write() Tejun Heo
2016-02-24 22:02   ` [PATCH 08/16] cgroup: factor out cgroup_apply_control_enable() " Tejun Heo
2016-02-24 22:02   ` [PATCH 12/16] cgroup: combine cgroup_mutex locking and offline css draining Tejun Heo
2016-02-24 22:02   ` [PATCH 13/16] cgroup: use cgroup_apply_enable_control() in cgroup creation path Tejun Heo
2016-02-24 22:02   ` [PATCH 14/16] cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends Tejun Heo
2016-02-24 22:02   ` [PATCH 15/16] cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask Tejun Heo
2016-02-24 22:02   ` [PATCH 16/16] cgroup: allocate 2x cgrp_cset_links when setting up a new root Tejun Heo
2016-03-02 18:14   ` [PATCH 17/17] cgroup: update css iteration in cgroup_update_dfl_csses() Tejun Heo
2016-03-03  3:02   ` [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Zefan Li
2016-02-24 22:02 ` [PATCH 04/16] cgroup: factor out cgroup_create() out of cgroup_mkdir() Tejun Heo
2016-02-24 22:02 ` [PATCH 05/16] cgroup: introduce cgroup_control() and cgroup_ss_mask() Tejun Heo
2016-02-24 22:02 ` [PATCH 06/16] cgroup: factor out cgroup_drain_offline() from cgroup_subtree_control_write() Tejun Heo
2016-02-24 22:02 ` [PATCH 09/16] cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive Tejun Heo
2016-02-24 22:02 ` [PATCH 10/16] cgroup: introduce cgroup_{save|propagate|restore}_control() Tejun Heo
2016-02-24 22:02 ` [PATCH 11/16] cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write() Tejun Heo
2016-03-03 14:59 ` [PATCHSET v2 cgroup/for-4.6] cgroup: make control mask updates modular and recursive Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).