public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] cgroups: Bindable cgroup subsystems
@ 2010-12-15  9:34 Li Zefan
  2010-12-15  9:35 ` [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys Li Zefan
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Li Zefan @ 2010-12-15  9:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
	Stephane Eranian, LKML, containers

Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.

But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.

This patchset alleviates the pain, and then a subsytem can be
bound/unbound to/from a hierarchy which has sub-cgroups in it.

Some subsystems still can't take advantage of this patchset, memcgroup
and cpuset for example.

For cpuset, if a hierarchy has a sub-cgroup and the cgroup has tasks,
we can't decide sub-cgroup's cpuset.mems and cpuset.cpus automatically
if we try to bind cpuset to this hierarchy.

For memcgroup, memcgroup uses css_get/put(), and due to some complexity,
for now bindable subsystems should not use css_get/put().

Usage:

# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks

(add cpuacct to the hierarchy)
# mount -o remount,cpuset,cpuacct xxx /mnt

(remove it from the hierarchy)
# mount -o remount,cpuset xxx /mnt

There's another limitation, cpuacct should not be bound to any mounted
hierarchy before the above operation. But that's not a problem, as you
can remove it from a hierarchy and bind it to another one.

Changelog v2:

- Fix some bugs.
- Spit can_bind flag to bindable and unbindable flags
- Provide a __css_tryget() so a bindable subsystem can pin a cgroup
  via it.
- ...

---
 Documentation/cgroups/cgroups.txt |   37 +++-
 include/linux/cgroup.h            |   39 +++-
 kernel/cgroup.c                   |  391 +++++++++++++++++++++++++++++++------
 kernel/cgroup_freezer.c           |    1 +
 kernel/sched.c                    |    2 +
 security/device_cgroup.c          |    2 +
 6 files changed, 398 insertions(+), 74 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys
  2010-12-15  9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
@ 2010-12-15  9:35 ` Li Zefan
  2010-12-15  9:35 ` [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy Li Zefan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Li Zefan @ 2010-12-15  9:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
	Stephane Eranian, LKML, containers

On x86_32, sizeof(struct cgroup_subsys) shrinks from 276 bytes
to 264.

Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed4ba11..63d953d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -481,14 +481,16 @@ struct cgroup_subsys {
 	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
 
 	int subsys_id;
-	int active;
-	int disabled;
-	int early_init;
+
+	bool active:1;
+	bool disabled:1;
+	bool early_init:1;
 	/*
 	 * True if this subsys uses ID. ID is not available before cgroup_init()
 	 * (not available in early_init time.)
 	 */
-	bool use_id;
+	bool use_id:1;
+
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
 
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy
  2010-12-15  9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
  2010-12-15  9:35 ` [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys Li Zefan
@ 2010-12-15  9:35 ` Li Zefan
  2010-12-15  9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Li Zefan @ 2010-12-15  9:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
	Stephane Eranian, LKML, containers

Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.

But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.

This patch alleviates the pain, and then a subsytem can be bind to
a hierarchy which has sub-cgroups in it.

Matt also commented that users will appreciate this feature.

For a cgroup subsystem to become bindable, the bindable flag of
struct cgroup_subsys should be set.

But for some constraints, not all subsystems can take advantage of
this patch. For example, we can't decide a cgroup's cpuset.mems and
cpuset.cpus automatically, so cpuset is not bindable.

Usage:

  # mount -t cgroup -o cpuset xxx /mnt
  # mkdir /mnt/tmp
  # echo $$ > /mnt/tmp/tasks

(assume cpuacct is bindable, and we add cpuacct to the hierarchy)

  # mount -o remount,cpuset,cpuacct xxx /mnt

Changelog v2:

- Add more code comments.
- Use rcu_assign_pointer in hierarchy_update_css_sets().
- Fix to nullify css pointers in hierarchy_attach_css_failed().
- Fix to call post_clone() for newly-created css.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h |    5 +
 kernel/cgroup.c        |  273 ++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 221 insertions(+), 57 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 63d953d..d8c4e22 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -490,6 +490,11 @@ struct cgroup_subsys {
 	 * (not available in early_init time.)
 	 */
 	bool use_id:1;
+	/*
+	 * Indicate if this subsystem can be bound to a cgroup hierarchy
+	 * which has child cgroups.
+	 */
+	bool bindable:1;
 
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 66a416b..caac80f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,6 +57,7 @@
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
 #include <linux/eventfd.h>
 #include <linux/poll.h>
+#include <linux/bitops.h>
 
 #include <asm/atomic.h>
 
@@ -871,18 +872,13 @@ static void remove_dir(struct dentry *d)
 
 static void cgroup_clear_directory(struct dentry *dentry)
 {
-	struct list_head *node;
+	struct dentry *d, *tmp;
 
 	BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
 	spin_lock(&dcache_lock);
-	node = dentry->d_subdirs.next;
-	while (node != &dentry->d_subdirs) {
-		struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
-		list_del_init(node);
-		if (d->d_inode) {
-			/* This should never be called on a cgroup
-			 * directory with child cgroups */
-			BUG_ON(d->d_inode->i_mode & S_IFDIR);
+	list_for_each_entry_safe(d, tmp, &dentry->d_subdirs, d_u.d_child) {
+		if (d->d_inode && !(d->d_inode->i_mode & S_IFDIR)) {
+			list_del_init(&d->d_u.d_child);
 			d = dget_locked(d);
 			spin_unlock(&dcache_lock);
 			d_delete(d);
@@ -890,7 +886,6 @@ static void cgroup_clear_directory(struct dentry *dentry)
 			dput(d);
 			spin_lock(&dcache_lock);
 		}
-		node = dentry->d_subdirs.next;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -935,6 +930,171 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
 	css_put(css);
 }
 
+static void init_cgroup_css(struct cgroup_subsys_state *css,
+			       struct cgroup_subsys *ss,
+			       struct cgroup *cgrp)
+{
+	css->cgroup = cgrp;
+	atomic_set(&css->refcnt, 1);
+	css->flags = 0;
+	css->id = NULL;
+	if (cgrp == dummytop)
+		set_bit(CSS_ROOT, &css->flags);
+	BUG_ON(cgrp->subsys[ss->subsys_id]);
+	cgrp->subsys[ss->subsys_id] = css;
+}
+
+static int cgroup_attach_css(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+	struct cgroup_subsys_state *css;
+	int ret;
+
+	css = ss->create(ss, cgrp);
+	if (IS_ERR(css))
+		return PTR_ERR(css);
+	init_cgroup_css(css, ss, cgrp);
+
+	if (ss->use_id) {
+		ret = alloc_css_id(ss, cgrp->parent, cgrp);
+		if (ret)
+			return ret;
+	}
+	/* At error, ->destroy() callback has to free assigned ID. */
+
+	if (clone_children(cgrp->parent) && ss->post_clone)
+		ss->post_clone(ss, cgrp);
+
+	return 0;
+}
+
+/*
+ * cgroup_walk_hierarchy - iterate through a cgroup hierarchy
+ * @process_cgroup: callback called on each cgroup in the hierarchy
+ * @data: will be passed to @process_cgroup
+ * @top_cgrp: the root cgroup of the hierarchy
+ *
+ * It's a pre-order traversal, so a parent cgroup will be processed before
+ * its children.
+ */
+static int cgroup_walk_hierarchy(int (*process_cgroup)(struct cgroup *, void *),
+				 void *data, struct cgroup *top_cgrp)
+{
+	struct cgroup *parent = top_cgrp;
+	struct cgroup *child;
+	struct list_head *node;
+	int ret;
+
+	node = parent->children.next;
+repeat:
+	while (node != &parent->children) {
+		child = list_entry(node, struct cgroup, sibling);
+
+		/* Process this cgroup */
+		ret = process_cgroup(child, data);
+		if (ret)
+			return ret;
+
+		/* Process its children */
+		if (!list_empty(&child->children)) {
+			parent = child;
+			node = parent->children.next;
+			goto repeat;
+		} else
+			node = node->next;
+	}
+
+	/* Process its siblings */
+	if (parent != top_cgrp) {
+		child = parent;
+		parent = child->parent;
+		node = child->sibling.next;
+		goto repeat;
+	}
+
+	return 0;
+}
+
+/*
+ * If hierarchy_attach_css() failed, do some cleanup.
+ */
+static int hierarchy_attach_css_failed(struct cgroup *cgrp, void *data)
+{
+	unsigned long added_bits = (unsigned long)data;
+	int i;
+
+	for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+		if (cgrp->subsys[i]) {
+			subsys[i]->destroy(subsys[i], cgrp);
+			cgrp->subsys[i] = NULL;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Allocate css objects of added subsystems, and attach them to the
+ * existing cgroup.
+ */
+static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
+{
+	unsigned long added_bits = (unsigned long)data;
+	int i;
+	int ret = 0;
+
+	for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+		ret = cgroup_attach_css(subsys[i], cgrp);
+		if (ret)
+			break;
+	}
+
+	if (ret)
+		cgroup_walk_hierarchy(hierarchy_attach_css_failed, data,
+				      cgrp->top_cgroup);
+	return ret;
+}
+
+/*
+ * After attaching new css objects to the cgroup, we need to entangle
+ * them into the existing css_sets.
+ */
+static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+{
+	unsigned long added_bits = (unsigned long)data;
+	int i;
+	struct cg_cgroup_link *link;
+
+	write_lock(&css_set_lock);
+	list_for_each_entry(link, &cgrp->css_sets, cgrp_link_list) {
+		struct css_set *cg = link->cg;
+		struct hlist_head *hhead;
+
+		for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+			rcu_assign_pointer(cg->subsys[i], cgrp->subsys[i]);
+
+		/* rehash */
+		hlist_del(&cg->hlist);
+		hhead = css_set_hash(cg->subsys);
+		hlist_add_head(&cg->hlist, hhead);
+	}
+	write_unlock(&css_set_lock);
+
+	return 0;
+}
+
+/*
+ * Re-populate each cgroup directory.
+ *
+ * Note root cgroup's inode mutex is held.
+ */
+static int hierarchy_populate_dir(struct cgroup *cgrp, void *data)
+{
+	mutex_lock_nested(&cgrp->dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+	cgroup_populate_dir(cgrp);
+	mutex_unlock(&cgrp->dentry->d_inode->i_mutex);
+	return 0;
+}
+
 /*
  * Call with cgroup_mutex held. Drops reference counts on modules, including
  * any duplicate ones that parse_cgroupfs_options took. If this function
@@ -946,36 +1106,59 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 	unsigned long added_bits, removed_bits;
 	struct cgroup *cgrp = &root->top_cgroup;
 	int i;
+	int err;
 
 	BUG_ON(!mutex_is_locked(&cgroup_mutex));
 
 	removed_bits = root->actual_subsys_bits & ~final_bits;
 	added_bits = final_bits & ~root->actual_subsys_bits;
+
 	/* Check that any added subsystems are currently free */
-	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-		unsigned long bit = 1UL << i;
-		struct cgroup_subsys *ss = subsys[i];
-		if (!(bit & added_bits))
-			continue;
+	for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
 		/*
 		 * Nobody should tell us to do a subsys that doesn't exist:
 		 * parse_cgroupfs_options should catch that case and refcounts
 		 * ensure that subsystems won't disappear once selected.
 		 */
-		BUG_ON(ss == NULL);
-		if (ss->root != &rootnode) {
+		BUG_ON(subsys[i] == NULL);
+		if (subsys[i]->root != &rootnode) {
 			/* Subsystem isn't free */
 			return -EBUSY;
 		}
 	}
 
-	/* Currently we don't handle adding/removing subsystems when
-	 * any child cgroups exist. This is theoretically supportable
-	 * but involves complex error handling, so it's being left until
-	 * later */
-	if (root->number_of_cgroups > 1)
+	/* Removing will be supported later */
+	if (root->number_of_cgroups > 1 && removed_bits)
 		return -EBUSY;
 
+	/*
+	 * For non-trivial hierarchy, check that added subsystems
+	 * are all bindable
+	 */
+	if (root->number_of_cgroups > 1) {
+		for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+			if (!subsys[i]->bindable)
+				return -EBUSY;
+	}
+
+	/* Attach css objects to the top cgroup */
+	for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+		BUG_ON(cgrp->subsys[i]);
+		BUG_ON(!dummytop->subsys[i]);
+		BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
+
+		cgrp->subsys[i] = dummytop->subsys[i];
+		cgrp->subsys[i]->cgroup = cgrp;
+	}
+
+	err = cgroup_walk_hierarchy(hierarchy_attach_css,
+				    (void *)added_bits, cgrp);
+	if (err)
+		goto failed;
+
+	cgroup_walk_hierarchy(hierarchy_update_css_sets,
+			      (void *)added_bits, cgrp);
+
 	/* Process each subsystem */
 	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		struct cgroup_subsys *ss = subsys[i];
@@ -983,12 +1166,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 		if (bit & added_bits) {
 			/* We're binding this subsystem to this hierarchy */
 			BUG_ON(ss == NULL);
-			BUG_ON(cgrp->subsys[i]);
-			BUG_ON(!dummytop->subsys[i]);
-			BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
 			mutex_lock(&ss->hierarchy_mutex);
-			cgrp->subsys[i] = dummytop->subsys[i];
-			cgrp->subsys[i]->cgroup = cgrp;
 			list_move(&ss->sibling, &root->subsys_list);
 			ss->root = root;
 			if (ss->bind)
@@ -1001,10 +1179,10 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 			BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
 			BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
 			mutex_lock(&ss->hierarchy_mutex);
-			if (ss->bind)
-				ss->bind(ss, dummytop);
 			dummytop->subsys[i]->cgroup = dummytop;
 			cgrp->subsys[i] = NULL;
+			if (ss->bind)
+				ss->bind(ss, dummytop);
 			subsys[i]->root = &rootnode;
 			list_move(&ss->sibling, &rootnode.subsys_list);
 			mutex_unlock(&ss->hierarchy_mutex);
@@ -1031,6 +1209,12 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 	synchronize_rcu();
 
 	return 0;
+
+failed:
+	for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+		cgrp->subsys[i] = NULL;
+
+	return err;
 }
 
 static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
@@ -1286,6 +1470,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
 
 	/* (re)populate subsystem files */
 	cgroup_populate_dir(cgrp);
+	cgroup_walk_hierarchy(hierarchy_populate_dir, NULL, cgrp);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -3313,20 +3498,6 @@ static int cgroup_populate_dir(struct cgroup *cgrp)
 	return 0;
 }
 
-static void init_cgroup_css(struct cgroup_subsys_state *css,
-			       struct cgroup_subsys *ss,
-			       struct cgroup *cgrp)
-{
-	css->cgroup = cgrp;
-	atomic_set(&css->refcnt, 1);
-	css->flags = 0;
-	css->id = NULL;
-	if (cgrp == dummytop)
-		set_bit(CSS_ROOT, &css->flags);
-	BUG_ON(cgrp->subsys[ss->subsys_id]);
-	cgrp->subsys[ss->subsys_id] = css;
-}
-
 static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
 {
 	/* We need to take each hierarchy_mutex in a consistent order */
@@ -3401,21 +3572,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 		set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
 
 	for_each_subsys(root, ss) {
-		struct cgroup_subsys_state *css = ss->create(ss, cgrp);
-
-		if (IS_ERR(css)) {
-			err = PTR_ERR(css);
+		err = cgroup_attach_css(ss, cgrp);
+		if (err)
 			goto err_destroy;
-		}
-		init_cgroup_css(css, ss, cgrp);
-		if (ss->use_id) {
-			err = alloc_css_id(ss, parent, cgrp);
-			if (err)
-				goto err_destroy;
-		}
-		/* At error, ->destroy() callback has to free assigned ID. */
-		if (clone_children(parent) && ss->post_clone)
-			ss->post_clone(ss, cgrp);
 	}
 
 	cgroup_lock_hierarchy(root);
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/6] cgroups: Allow to unbind subsystem from a cgroup hierarchy
  2010-12-15  9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
  2010-12-15  9:35 ` [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys Li Zefan
  2010-12-15  9:35 ` [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy Li Zefan
@ 2010-12-15  9:35 ` Li Zefan
  2010-12-15  9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Li Zefan @ 2010-12-15  9:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
	Stephane Eranian, LKML, containers

This allows us to unbind a cgroup subsystem from a hierarchy
which has sub-cgroups in it.

If a subsystem is to support unbinding, when pinning a cgroup
via css refcnt, it should use __css_tryget() instead of css_get().

Usage:

 # mount -t cgroup -o cpuset,cpuacct xxx /mnt
 # mkdir /mnt/tmp
 # echo $$ > /mnt/tmp/tasks

 (remove it from the hierarchy)
 # mount -o remount,cpuset xxx /mnt

Changelog v2:

- Allow a cgroup subsystem to use css refcnt.
- Add more code comments.
- Use rcu_assign_pointer() in hierarchy_update_css_sets().
- Split can_bind flag to bindable and unbindable flags.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h |   17 ++++++
 kernel/cgroup.c        |  139 +++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 138 insertions(+), 18 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d8c4e22..17579b2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -110,6 +110,18 @@ static inline bool css_is_removed(struct cgroup_subsys_state *css)
 }
 
 /*
+ * For a subsystem which supports unbinding, call this to get css
+ * refcnt. Called with rcu_read_lock or cgroup_mutex held.
+ */
+
+static inline bool __css_tryget(struct cgroup_subsys_state *css)
+{
+	if (test_bit(CSS_ROOT, &css->flags))
+		return true;
+	return atomic_inc_not_zero(&css->refcnt);
+}
+
+/*
  * Call css_tryget() to take a reference on a css if your existing
  * (known-valid) reference isn't already ref-counted. Returns false if
  * the css has been destroyed.
@@ -495,6 +507,11 @@ struct cgroup_subsys {
 	 * which has child cgroups.
 	 */
 	bool bindable:1;
+	/*
+	 * Indicate if this subsystem can be removed from a cgroup hierarchy
+	 * which has child cgroups.
+	 */
+	bool unbindable:1;
 
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index caac80f..463575d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1055,12 +1055,61 @@ static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
 }
 
 /*
- * After attaching new css objects to the cgroup, we need to entangle
- * them into the existing css_sets.
+ * Reset those css objects whose refcnts are cleared.
  */
-static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+static int hierarchy_reset_css_refs(struct cgroup *cgrp, void *data)
+{
+	unsigned long removed_bits = (unsigned long)data;
+	int i;
+
+	for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+		if (atomic_read(&css->refcnt) == 0)
+			atomic_set(&css->refcnt, 1);
+	}
+	return 0;
+}
+
+/*
+ * Clear all the css objects' refcnt to 0. If there's a refcnt > 1,
+ * return failure.
+ */
+static int hierarchy_clear_css_refs(struct cgroup *cgrp, void *data)
+{
+	unsigned long removed_bits = (unsigned long)data;
+	int i;
+
+	for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+		struct cgroup_subsys_state *css = cgrp->subsys[i];
+
+		if (atomic_cmpxchg(&css->refcnt, 1, 0) != 1)
+			goto failed;
+	}
+	return 0;
+failed:
+	hierarchy_reset_css_refs(struct cgroup *cgrp, void *data);
+	return -EBUSY;
+}
+
+/*
+ * We're removing some subsystems from cgroup hierarchy, and here we
+ * remove and destroy the css objects from each cgroup.
+ */
+static int hierarchy_remove_css(struct cgroup *cgrp, void *data)
+{
+	unsigned long removed_bits = (unsigned long)data;
+	int i;
+
+	for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+		subsys[i]->destroy(subsys[i], cgrp);
+		cgrp->subsys[i] = NULL;
+	}
+
+	return 0;
+}
+
+static int hierarchy_update_css_sets(struct cgroup *cgrp,
+				     unsigned long bits, bool add)
 {
-	unsigned long added_bits = (unsigned long)data;
 	int i;
 	struct cg_cgroup_link *link;
 
@@ -1069,8 +1118,14 @@ static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
 		struct css_set *cg = link->cg;
 		struct hlist_head *hhead;
 
-		for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
-			rcu_assign_pointer(cg->subsys[i], cgrp->subsys[i]);
+		for_each_set_bit(i, &bits, CGROUP_SUBSYS_COUNT) {
+			if (add)
+				rcu_assign_pointer(cg->subsys[i],
+						   cgrp->subsys[i]);
+			else
+				rcu_assign_pointer(cg->subsys[i],
+						   dummytop->subsys[i]);
+		}
 
 		/* rehash */
 		hlist_del(&cg->hlist);
@@ -1083,6 +1138,30 @@ static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
 }
 
 /*
+ * After attaching new css objects to the cgroup, we need to entangle
+ * them into the existing css_sets.
+ */
+static int hierarchy_add_to_css_sets(struct cgroup *cgrp, void *data)
+{
+	unsigned long added_bits = (unsigned long)data;
+
+	hierarchy_update_css_sets(cgrp, added_bits, true);
+	return 0;
+}
+
+/*
+ * Before dettaching and destroying css objects from the cgroup, we
+ * should detangle them from the existing css_sets.
+ */
+static int hierarchy_remove_from_css_sets(struct cgroup *cgrp, void *data)
+{
+	unsigned long removed_bits = (unsigned long)data;
+
+	hierarchy_update_css_sets(cgrp, removed_bits, false);
+	return 0;
+}
+
+/*
  * Re-populate each cgroup directory.
  *
  * Note root cgroup's inode mutex is held.
@@ -1127,18 +1206,17 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 		}
 	}
 
-	/* Removing will be supported later */
-	if (root->number_of_cgroups > 1 && removed_bits)
-		return -EBUSY;
-
 	/*
 	 * For non-trivial hierarchy, check that added subsystems
-	 * are all bindable
+	 * are all bindable and removed subsystems are all unbindable
 	 */
 	if (root->number_of_cgroups > 1) {
 		for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
 			if (!subsys[i]->bindable)
 				return -EBUSY;
+		for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT)
+			if (!subsys[i]->unbindable)
+				return -EBUSY;
 	}
 
 	/* Attach css objects to the top cgroup */
@@ -1154,9 +1232,14 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 	err = cgroup_walk_hierarchy(hierarchy_attach_css,
 				    (void *)added_bits, cgrp);
 	if (err)
-		goto failed;
+		goto out;
+
+	err = cgroup_walk_hierarchy(hierarchy_clear_css_refs,
+				    (void *)removed_bits, cgrp);
+	if (err)
+		goto out_remove_css;
 
-	cgroup_walk_hierarchy(hierarchy_update_css_sets,
+	cgroup_walk_hierarchy(hierarchy_add_to_css_sets,
 			      (void *)added_bits, cgrp);
 
 	/* Process each subsystem */
@@ -1176,11 +1259,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 		} else if (bit & removed_bits) {
 			/* We're removing this subsystem */
 			BUG_ON(ss == NULL);
-			BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
-			BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
 			mutex_lock(&ss->hierarchy_mutex);
-			dummytop->subsys[i]->cgroup = dummytop;
-			cgrp->subsys[i] = NULL;
 			if (ss->bind)
 				ss->bind(ss, dummytop);
 			subsys[i]->root = &rootnode;
@@ -1206,11 +1285,35 @@ static int rebind_subsystems(struct cgroupfs_root *root,
 		}
 	}
 	root->subsys_bits = root->actual_subsys_bits = final_bits;
+
+	for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+		BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
+		BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
+
+		dummytop->subsys[i]->cgroup = dummytop;
+		cgrp->subsys[i] = NULL;
+	}
+
+	cgroup_walk_hierarchy(hierarchy_remove_from_css_sets,
+			      (void *)removed_bits, cgrp);
+
+	/*
+	 * There might be some pointers to the cgrouip_subsys_state
+	 * that we are going to destroy.
+	 */
+	synchronize_rcu();
+
+	cgroup_walk_hierarchy(hierarchy_remove_css,
+			      (void *)removed_bits, cgrp);
+
 	synchronize_rcu();
 
 	return 0;
 
-failed:
+out_remove_css:
+	cgroup_walk_hierarchy(hierarchy_remove_css,
+			      (void *)added_bits, cgrp);
+out:
 	for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
 		cgrp->subsys[i] = NULL;
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable
  2010-12-15  9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
                   ` (2 preceding siblings ...)
  2010-12-15  9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
@ 2010-12-15  9:36 ` Li Zefan
  2010-12-15  9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
  2010-12-15  9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan
  5 siblings, 0 replies; 7+ messages in thread
From: Li Zefan @ 2010-12-15  9:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
	Stephane Eranian, LKML, containers, Ingo Molnar, serge

For those subsystems (debug, cpuacct, net_cls and devices),
setting the bindable/unbindable flag is sufficient.

Set freezer subsystem as bindable but not unbindable, because
sub-cgroups' can be in FROZEN state.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/cgroup.c          |    6 +++++-
 kernel/cgroup_freezer.c  |    1 +
 kernel/sched.c           |    2 ++
 security/device_cgroup.c |    2 ++
 4 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 463575d..fa2c5de 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1063,6 +1063,8 @@ static int hierarchy_reset_css_refs(struct cgroup *cgrp, void *data)
 	int i;
 
 	for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+		struct cgroup_subsys_state *css = cgrp->subsys[i];
+
 		if (atomic_read(&css->refcnt) == 0)
 			atomic_set(&css->refcnt, 1);
 	}
@@ -1086,7 +1088,7 @@ static int hierarchy_clear_css_refs(struct cgroup *cgrp, void *data)
 	}
 	return 0;
 failed:
-	hierarchy_reset_css_refs(struct cgroup *cgrp, void *data);
+	hierarchy_reset_css_refs(cgrp, data);
 	return -EBUSY;
 }
 
@@ -5201,5 +5203,7 @@ struct cgroup_subsys debug_subsys = {
 	.destroy = debug_destroy,
 	.populate = debug_populate,
 	.subsys_id = debug_subsys_id,
+	.bindable = true,
+	.unbindable = true,
 };
 #endif /* CONFIG_CGROUP_DEBUG */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e7bebb7..213ecd9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -393,4 +393,5 @@ struct cgroup_subsys freezer_subsys = {
 	.attach		= NULL,
 	.fork		= freezer_fork,
 	.exit		= NULL,
+	.bindable	= true,
 };
diff --git a/kernel/sched.c b/kernel/sched.c
index dc91a4d..930ee2e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9346,6 +9346,8 @@ struct cgroup_subsys cpuacct_subsys = {
 	.destroy = cpuacct_destroy,
 	.populate = cpuacct_populate,
 	.subsys_id = cpuacct_subsys_id,
+	.bindable = true,
+	.unbindable = true,
 };
 #endif	/* CONFIG_CGROUP_CPUACCT */
 
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 8d9c48f..51321e9 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -473,6 +473,8 @@ struct cgroup_subsys devices_subsys = {
 	.destroy = devcgroup_destroy,
 	.populate = devcgroup_populate,
 	.subsys_id = devices_subsys_id,
+	.bindable = true,
+	.unbindable = true,
 };
 
 int devcgroup_inode_permission(struct inode *inode, int mask)
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get()
  2010-12-15  9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
                   ` (3 preceding siblings ...)
  2010-12-15  9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
@ 2010-12-15  9:36 ` Li Zefan
  2010-12-15  9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan
  5 siblings, 0 replies; 7+ messages in thread
From: Li Zefan @ 2010-12-15  9:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
	Stephane Eranian, LKML, containers

For now unbindable subsystems should not use css_get/put(), so check
this misuse.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h |    7 +++++--
 kernel/cgroup.c        |    5 +++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 17579b2..e8ad9f1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -80,13 +80,15 @@ struct cgroup_subsys_state {
 
 /* bits in struct cgroup_subsys_state flags field */
 enum {
-	CSS_ROOT, /* This CSS is the root of the subsystem */
-	CSS_REMOVED, /* This CSS is dead */
+	CSS_ROOT,	/* This CSS is the root of the subsystem */
+	CSS_REMOVED,	/* This CSS is dead */
+	CSS_NO_GET,	/* Forbid calling css_get/put() */
 };
 
 /* Caller must verify that the css is not for root cgroup */
 static inline void __css_get(struct cgroup_subsys_state *css, int count)
 {
+	BUG_ON(test_bit(CSS_NO_GET, &css->flags));
 	atomic_add(count, &css->refcnt);
 }
 
@@ -131,6 +133,7 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
 {
 	if (test_bit(CSS_ROOT, &css->flags))
 		return true;
+	BUG_ON(test_bit(CSS_NO_GET, &css->flags));
 	while (!atomic_inc_not_zero(&css->refcnt)) {
 		if (test_bit(CSS_REMOVED, &css->flags))
 			return false;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index fa2c5de..d49a459 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -938,6 +938,11 @@ static void init_cgroup_css(struct cgroup_subsys_state *css,
 	atomic_set(&css->refcnt, 1);
 	css->flags = 0;
 	css->id = NULL;
+
+	/* For now, unbindable subsystems should not call css_get/put(). */
+	if (ss->unbindable)
+		set_bit(CSS_NO_GET, &css->flags);
+
 	if (cgrp == dummytop)
 		set_bit(CSS_ROOT, &css->flags);
 	BUG_ON(cgrp->subsys[ss->subsys_id]);
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems
  2010-12-15  9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
                   ` (4 preceding siblings ...)
  2010-12-15  9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
@ 2010-12-15  9:36 ` Li Zefan
  5 siblings, 0 replies; 7+ messages in thread
From: Li Zefan @ 2010-12-15  9:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
	Stephane Eranian, LKML, containers

Provide a usage example, update the bind() callback API, etc.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 Documentation/cgroups/cgroups.txt |   37 +++++++++++++++++++++++++++++--------
 1 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 190018b..4e772cc 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -363,17 +363,23 @@ Note this will add ns to the hierarchy but won't remove memory or
 cpuset, because the new options are appended to the old ones:
 # mount -o remount,ns /dev/cgroup
 
+For some subsystems you can bind them to a mounted hierarchy or
+remove them from it, even if there're sub-cgroups in it:
+# mount -t cgroup -o freezer hier1 /dev/cgroup
+# echo $$ > /dev/cgroup/my_cgroup
+# mount -o freezer,cpuset hier1 /dev/cgroup
+(failed)
+# mount -o freezer,cpuacct hier1 /dev/cgroup
+# mount -o cpuacct hier1 /dev/cgroup
+
+Note cpuacct should be sit in the default hierarchy before remount.
+
 To Specify a hierarchy's release_agent:
 # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
   xxx /dev/cgroup
 
 Note that specifying 'release_agent' more than once will return failure.
 
-Note that changing the set of subsystems is currently only supported
-when the hierarchy consists of a single (root) cgroup. Supporting
-the ability to arbitrarily bind/unbind subsystems from an existing
-cgroup hierarchy is intended to be implemented in the future.
-
 Then under /dev/cgroup you can find a tree that corresponds to the
 tree of the cgroups in the system. For instance, /dev/cgroup
 is the cgroup that holds the whole system.
@@ -523,6 +529,15 @@ module initcall a call to cgroup_load_subsys(), and in its exitcall a
 call to cgroup_unload_subsys(). It should also set its_subsys.module =
 THIS_MODULE in its .c file.
 
+If a subsystem has bindable flag set, normally it has to be able to
+support side-effect free movement of a task into any just-created
+cgroups. i.e. it's probably not suitable for any subsystem where
+can_attach() might return false for the newly-created cgroup, or
+attach() might have side-effects for those same cases.
+
+If a subsystem has unbindable flag set, normally it has to be able to
+support side-effect free movement of a task into the roog cgroup.
+
 Each subsystem may export the following methods. The only mandatory
 methods are create/destroy. Any others that are null are presumed to
 be successful no-ops.
@@ -627,9 +642,15 @@ void bind(struct cgroup_subsys *ss, struct cgroup *root)
 (cgroup_mutex and ss->hierarchy_mutex held by caller)
 
 Called when a cgroup subsystem is rebound to a different hierarchy
-and root cgroup. Currently this will only involve movement between
-the default hierarchy (which never has sub-cgroups) and a hierarchy
-that is being created/destroyed (and hence has no sub-cgroups).
+and root cgroup.
+
+For non-bindable subsystems, this will only involve movement
+between the default hierarchy (which never has sub-cgroups) and a
+hierarchy that is being created/destroyed (and hence has no sub-cgroups).
+
+For binadable subsystems, this may also involve movement between the
+default hierarchy and a mounted hierarchy that's populated with
+sub-cgroups.
 
 4. Questions
 ============
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-12-15  9:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-15  9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
2010-12-15  9:35 ` [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys Li Zefan
2010-12-15  9:35 ` [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy Li Zefan
2010-12-15  9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
2010-12-15  9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
2010-12-15  9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
2010-12-15  9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox