* [PATCH v2 0/6] cgroups: Bindable cgroup subsystems
@ 2010-12-15 9:34 Li Zefan
2010-12-15 9:35 ` [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys Li Zefan
` (6 more replies)
0 siblings, 7 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
Stephane Eranian, LKML, containers
Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.
But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.
This patchset alleviates the pain, and then a subsytem can be
bound/unbound to/from a hierarchy which has sub-cgroups in it.
Some subsystems still can't take advantage of this patchset, memcgroup
and cpuset for example.
For cpuset, if a hierarchy has a sub-cgroup and the cgroup has tasks,
we can't decide sub-cgroup's cpuset.mems and cpuset.cpus automatically
if we try to bind cpuset to this hierarchy.
For memcgroup, memcgroup uses css_get/put(), and due to some complexity,
for now bindable subsystems should not use css_get/put().
Usage:
# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks
(add cpuacct to the hierarchy)
# mount -o remount,cpuset,cpuacct xxx /mnt
(remove it from the hierarchy)
# mount -o remount,cpuset xxx /mnt
There's another limitation, cpuacct should not be bound to any mounted
hierarchy before the above operation. But that's not a problem, as you
can remove it from a hierarchy and bind it to another one.
Changelog v2:
- Fix some bugs.
- Spit can_bind flag to bindable and unbindable flags
- Provide a __css_tryget() so a bindable subsystem can pin a cgroup
via it.
- ...
---
Documentation/cgroups/cgroups.txt | 37 +++-
include/linux/cgroup.h | 39 +++-
kernel/cgroup.c | 391 +++++++++++++++++++++++++++++++------
kernel/cgroup_freezer.c | 1 +
kernel/sched.c | 2 +
security/device_cgroup.c | 2 +
6 files changed, 398 insertions(+), 74 deletions(-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2010-12-15 9:35 ` Li Zefan
2010-12-15 9:35 ` [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy Li Zefan
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:35 UTC (permalink / raw)
To: Andrew Morton
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
Stephane Eranian, Paul Menage
On x86_32, sizeof(struct cgroup_subsys) shrinks from 276 bytes
to 264.
Acked-by: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
include/linux/cgroup.h | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed4ba11..63d953d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -481,14 +481,16 @@ struct cgroup_subsys {
void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
int subsys_id;
- int active;
- int disabled;
- int early_init;
+
+ bool active:1;
+ bool disabled:1;
+ bool early_init:1;
/*
* True if this subsys uses ID. ID is not available before cgroup_init()
* (not available in early_init time.)
*/
- bool use_id;
+ bool use_id:1;
+
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys
2010-12-15 9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
@ 2010-12-15 9:35 ` Li Zefan
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (5 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:35 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
Stephane Eranian, LKML, containers
On x86_32, sizeof(struct cgroup_subsys) shrinks from 276 bytes
to 264.
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/cgroup.h | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed4ba11..63d953d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -481,14 +481,16 @@ struct cgroup_subsys {
void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
int subsys_id;
- int active;
- int disabled;
- int early_init;
+
+ bool active:1;
+ bool disabled:1;
+ bool early_init:1;
/*
* True if this subsys uses ID. ID is not available before cgroup_init()
* (not available in early_init time.)
*/
- bool use_id;
+ bool use_id:1;
+
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2010-12-15 9:35 ` Li Zefan
@ 2010-12-15 9:35 ` Li Zefan
2010-12-15 9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:35 UTC (permalink / raw)
To: Andrew Morton
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
Stephane Eranian, Paul Menage
Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.
But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.
This patch alleviates the pain, and then a subsytem can be bind to
a hierarchy which has sub-cgroups in it.
Matt also commented that users will appreciate this feature.
For a cgroup subsystem to become bindable, the bindable flag of
struct cgroup_subsys should be set.
But for some constraints, not all subsystems can take advantage of
this patch. For example, we can't decide a cgroup's cpuset.mems and
cpuset.cpus automatically, so cpuset is not bindable.
Usage:
# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks
(assume cpuacct is bindable, and we add cpuacct to the hierarchy)
# mount -o remount,cpuset,cpuacct xxx /mnt
Changelog v2:
- Add more code comments.
- Use rcu_assign_pointer in hierarchy_update_css_sets().
- Fix to nullify css pointers in hierarchy_attach_css_failed().
- Fix to call post_clone() for newly-created css.
Signed-off-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
include/linux/cgroup.h | 5 +
kernel/cgroup.c | 273 ++++++++++++++++++++++++++++++++++++++----------
2 files changed, 221 insertions(+), 57 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 63d953d..d8c4e22 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -490,6 +490,11 @@ struct cgroup_subsys {
* (not available in early_init time.)
*/
bool use_id:1;
+ /*
+ * Indicate if this subsystem can be bound to a cgroup hierarchy
+ * which has child cgroups.
+ */
+ bool bindable:1;
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 66a416b..caac80f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,6 +57,7 @@
#include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
#include <linux/eventfd.h>
#include <linux/poll.h>
+#include <linux/bitops.h>
#include <asm/atomic.h>
@@ -871,18 +872,13 @@ static void remove_dir(struct dentry *d)
static void cgroup_clear_directory(struct dentry *dentry)
{
- struct list_head *node;
+ struct dentry *d, *tmp;
BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
spin_lock(&dcache_lock);
- node = dentry->d_subdirs.next;
- while (node != &dentry->d_subdirs) {
- struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
- list_del_init(node);
- if (d->d_inode) {
- /* This should never be called on a cgroup
- * directory with child cgroups */
- BUG_ON(d->d_inode->i_mode & S_IFDIR);
+ list_for_each_entry_safe(d, tmp, &dentry->d_subdirs, d_u.d_child) {
+ if (d->d_inode && !(d->d_inode->i_mode & S_IFDIR)) {
+ list_del_init(&d->d_u.d_child);
d = dget_locked(d);
spin_unlock(&dcache_lock);
d_delete(d);
@@ -890,7 +886,6 @@ static void cgroup_clear_directory(struct dentry *dentry)
dput(d);
spin_lock(&dcache_lock);
}
- node = dentry->d_subdirs.next;
}
spin_unlock(&dcache_lock);
}
@@ -935,6 +930,171 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
css_put(css);
}
+static void init_cgroup_css(struct cgroup_subsys_state *css,
+ struct cgroup_subsys *ss,
+ struct cgroup *cgrp)
+{
+ css->cgroup = cgrp;
+ atomic_set(&css->refcnt, 1);
+ css->flags = 0;
+ css->id = NULL;
+ if (cgrp == dummytop)
+ set_bit(CSS_ROOT, &css->flags);
+ BUG_ON(cgrp->subsys[ss->subsys_id]);
+ cgrp->subsys[ss->subsys_id] = css;
+}
+
+static int cgroup_attach_css(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+ struct cgroup_subsys_state *css;
+ int ret;
+
+ css = ss->create(ss, cgrp);
+ if (IS_ERR(css))
+ return PTR_ERR(css);
+ init_cgroup_css(css, ss, cgrp);
+
+ if (ss->use_id) {
+ ret = alloc_css_id(ss, cgrp->parent, cgrp);
+ if (ret)
+ return ret;
+ }
+ /* At error, ->destroy() callback has to free assigned ID. */
+
+ if (clone_children(cgrp->parent) && ss->post_clone)
+ ss->post_clone(ss, cgrp);
+
+ return 0;
+}
+
+/*
+ * cgroup_walk_hierarchy - iterate through a cgroup hierarchy
+ * @process_cgroup: callback called on each cgroup in the hierarchy
+ * @data: will be passed to @process_cgroup
+ * @top_cgrp: the root cgroup of the hierarchy
+ *
+ * It's a pre-order traversal, so a parent cgroup will be processed before
+ * its children.
+ */
+static int cgroup_walk_hierarchy(int (*process_cgroup)(struct cgroup *, void *),
+ void *data, struct cgroup *top_cgrp)
+{
+ struct cgroup *parent = top_cgrp;
+ struct cgroup *child;
+ struct list_head *node;
+ int ret;
+
+ node = parent->children.next;
+repeat:
+ while (node != &parent->children) {
+ child = list_entry(node, struct cgroup, sibling);
+
+ /* Process this cgroup */
+ ret = process_cgroup(child, data);
+ if (ret)
+ return ret;
+
+ /* Process its children */
+ if (!list_empty(&child->children)) {
+ parent = child;
+ node = parent->children.next;
+ goto repeat;
+ } else
+ node = node->next;
+ }
+
+ /* Process its siblings */
+ if (parent != top_cgrp) {
+ child = parent;
+ parent = child->parent;
+ node = child->sibling.next;
+ goto repeat;
+ }
+
+ return 0;
+}
+
+/*
+ * If hierarchy_attach_css() failed, do some cleanup.
+ */
+static int hierarchy_attach_css_failed(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ if (cgrp->subsys[i]) {
+ subsys[i]->destroy(subsys[i], cgrp);
+ cgrp->subsys[i] = NULL;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Allocate css objects of added subsystems, and attach them to the
+ * existing cgroup.
+ */
+static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+ int ret = 0;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ ret = cgroup_attach_css(subsys[i], cgrp);
+ if (ret)
+ break;
+ }
+
+ if (ret)
+ cgroup_walk_hierarchy(hierarchy_attach_css_failed, data,
+ cgrp->top_cgroup);
+ return ret;
+}
+
+/*
+ * After attaching new css objects to the cgroup, we need to entangle
+ * them into the existing css_sets.
+ */
+static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+ struct cg_cgroup_link *link;
+
+ write_lock(&css_set_lock);
+ list_for_each_entry(link, &cgrp->css_sets, cgrp_link_list) {
+ struct css_set *cg = link->cg;
+ struct hlist_head *hhead;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ rcu_assign_pointer(cg->subsys[i], cgrp->subsys[i]);
+
+ /* rehash */
+ hlist_del(&cg->hlist);
+ hhead = css_set_hash(cg->subsys);
+ hlist_add_head(&cg->hlist, hhead);
+ }
+ write_unlock(&css_set_lock);
+
+ return 0;
+}
+
+/*
+ * Re-populate each cgroup directory.
+ *
+ * Note root cgroup's inode mutex is held.
+ */
+static int hierarchy_populate_dir(struct cgroup *cgrp, void *data)
+{
+ mutex_lock_nested(&cgrp->dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+ cgroup_populate_dir(cgrp);
+ mutex_unlock(&cgrp->dentry->d_inode->i_mutex);
+ return 0;
+}
+
/*
* Call with cgroup_mutex held. Drops reference counts on modules, including
* any duplicate ones that parse_cgroupfs_options took. If this function
@@ -946,36 +1106,59 @@ static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long added_bits, removed_bits;
struct cgroup *cgrp = &root->top_cgroup;
int i;
+ int err;
BUG_ON(!mutex_is_locked(&cgroup_mutex));
removed_bits = root->actual_subsys_bits & ~final_bits;
added_bits = final_bits & ~root->actual_subsys_bits;
+
/* Check that any added subsystems are currently free */
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
- unsigned long bit = 1UL << i;
- struct cgroup_subsys *ss = subsys[i];
- if (!(bit & added_bits))
- continue;
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
/*
* Nobody should tell us to do a subsys that doesn't exist:
* parse_cgroupfs_options should catch that case and refcounts
* ensure that subsystems won't disappear once selected.
*/
- BUG_ON(ss == NULL);
- if (ss->root != &rootnode) {
+ BUG_ON(subsys[i] == NULL);
+ if (subsys[i]->root != &rootnode) {
/* Subsystem isn't free */
return -EBUSY;
}
}
- /* Currently we don't handle adding/removing subsystems when
- * any child cgroups exist. This is theoretically supportable
- * but involves complex error handling, so it's being left until
- * later */
- if (root->number_of_cgroups > 1)
+ /* Removing will be supported later */
+ if (root->number_of_cgroups > 1 && removed_bits)
return -EBUSY;
+ /*
+ * For non-trivial hierarchy, check that added subsystems
+ * are all bindable
+ */
+ if (root->number_of_cgroups > 1) {
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ if (!subsys[i]->bindable)
+ return -EBUSY;
+ }
+
+ /* Attach css objects to the top cgroup */
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ BUG_ON(cgrp->subsys[i]);
+ BUG_ON(!dummytop->subsys[i]);
+ BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
+
+ cgrp->subsys[i] = dummytop->subsys[i];
+ cgrp->subsys[i]->cgroup = cgrp;
+ }
+
+ err = cgroup_walk_hierarchy(hierarchy_attach_css,
+ (void *)added_bits, cgrp);
+ if (err)
+ goto failed;
+
+ cgroup_walk_hierarchy(hierarchy_update_css_sets,
+ (void *)added_bits, cgrp);
+
/* Process each subsystem */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
@@ -983,12 +1166,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
if (bit & added_bits) {
/* We're binding this subsystem to this hierarchy */
BUG_ON(ss == NULL);
- BUG_ON(cgrp->subsys[i]);
- BUG_ON(!dummytop->subsys[i]);
- BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
mutex_lock(&ss->hierarchy_mutex);
- cgrp->subsys[i] = dummytop->subsys[i];
- cgrp->subsys[i]->cgroup = cgrp;
list_move(&ss->sibling, &root->subsys_list);
ss->root = root;
if (ss->bind)
@@ -1001,10 +1179,10 @@ static int rebind_subsystems(struct cgroupfs_root *root,
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
mutex_lock(&ss->hierarchy_mutex);
- if (ss->bind)
- ss->bind(ss, dummytop);
dummytop->subsys[i]->cgroup = dummytop;
cgrp->subsys[i] = NULL;
+ if (ss->bind)
+ ss->bind(ss, dummytop);
subsys[i]->root = &rootnode;
list_move(&ss->sibling, &rootnode.subsys_list);
mutex_unlock(&ss->hierarchy_mutex);
@@ -1031,6 +1209,12 @@ static int rebind_subsystems(struct cgroupfs_root *root,
synchronize_rcu();
return 0;
+
+failed:
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ cgrp->subsys[i] = NULL;
+
+ return err;
}
static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
@@ -1286,6 +1470,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
/* (re)populate subsystem files */
cgroup_populate_dir(cgrp);
+ cgroup_walk_hierarchy(hierarchy_populate_dir, NULL, cgrp);
if (opts.release_agent)
strcpy(root->release_agent_path, opts.release_agent);
@@ -3313,20 +3498,6 @@ static int cgroup_populate_dir(struct cgroup *cgrp)
return 0;
}
-static void init_cgroup_css(struct cgroup_subsys_state *css,
- struct cgroup_subsys *ss,
- struct cgroup *cgrp)
-{
- css->cgroup = cgrp;
- atomic_set(&css->refcnt, 1);
- css->flags = 0;
- css->id = NULL;
- if (cgrp == dummytop)
- set_bit(CSS_ROOT, &css->flags);
- BUG_ON(cgrp->subsys[ss->subsys_id]);
- cgrp->subsys[ss->subsys_id] = css;
-}
-
static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
{
/* We need to take each hierarchy_mutex in a consistent order */
@@ -3401,21 +3572,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
for_each_subsys(root, ss) {
- struct cgroup_subsys_state *css = ss->create(ss, cgrp);
-
- if (IS_ERR(css)) {
- err = PTR_ERR(css);
+ err = cgroup_attach_css(ss, cgrp);
+ if (err)
goto err_destroy;
- }
- init_cgroup_css(css, ss, cgrp);
- if (ss->use_id) {
- err = alloc_css_id(ss, parent, cgrp);
- if (err)
- goto err_destroy;
- }
- /* At error, ->destroy() callback has to free assigned ID. */
- if (clone_children(parent) && ss->post_clone)
- ss->post_clone(ss, cgrp);
}
cgroup_lock_hierarchy(root);
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy
2010-12-15 9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
2010-12-15 9:35 ` [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys Li Zefan
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2010-12-15 9:35 ` Li Zefan
2010-12-15 9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
` (3 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:35 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
Stephane Eranian, LKML, containers
Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.
But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.
This patch alleviates the pain, and then a subsytem can be bind to
a hierarchy which has sub-cgroups in it.
Matt also commented that users will appreciate this feature.
For a cgroup subsystem to become bindable, the bindable flag of
struct cgroup_subsys should be set.
But for some constraints, not all subsystems can take advantage of
this patch. For example, we can't decide a cgroup's cpuset.mems and
cpuset.cpus automatically, so cpuset is not bindable.
Usage:
# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks
(assume cpuacct is bindable, and we add cpuacct to the hierarchy)
# mount -o remount,cpuset,cpuacct xxx /mnt
Changelog v2:
- Add more code comments.
- Use rcu_assign_pointer in hierarchy_update_css_sets().
- Fix to nullify css pointers in hierarchy_attach_css_failed().
- Fix to call post_clone() for newly-created css.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/cgroup.h | 5 +
kernel/cgroup.c | 273 ++++++++++++++++++++++++++++++++++++++----------
2 files changed, 221 insertions(+), 57 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 63d953d..d8c4e22 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -490,6 +490,11 @@ struct cgroup_subsys {
* (not available in early_init time.)
*/
bool use_id:1;
+ /*
+ * Indicate if this subsystem can be bound to a cgroup hierarchy
+ * which has child cgroups.
+ */
+ bool bindable:1;
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 66a416b..caac80f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,6 +57,7 @@
#include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
#include <linux/eventfd.h>
#include <linux/poll.h>
+#include <linux/bitops.h>
#include <asm/atomic.h>
@@ -871,18 +872,13 @@ static void remove_dir(struct dentry *d)
static void cgroup_clear_directory(struct dentry *dentry)
{
- struct list_head *node;
+ struct dentry *d, *tmp;
BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
spin_lock(&dcache_lock);
- node = dentry->d_subdirs.next;
- while (node != &dentry->d_subdirs) {
- struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
- list_del_init(node);
- if (d->d_inode) {
- /* This should never be called on a cgroup
- * directory with child cgroups */
- BUG_ON(d->d_inode->i_mode & S_IFDIR);
+ list_for_each_entry_safe(d, tmp, &dentry->d_subdirs, d_u.d_child) {
+ if (d->d_inode && !(d->d_inode->i_mode & S_IFDIR)) {
+ list_del_init(&d->d_u.d_child);
d = dget_locked(d);
spin_unlock(&dcache_lock);
d_delete(d);
@@ -890,7 +886,6 @@ static void cgroup_clear_directory(struct dentry *dentry)
dput(d);
spin_lock(&dcache_lock);
}
- node = dentry->d_subdirs.next;
}
spin_unlock(&dcache_lock);
}
@@ -935,6 +930,171 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
css_put(css);
}
+static void init_cgroup_css(struct cgroup_subsys_state *css,
+ struct cgroup_subsys *ss,
+ struct cgroup *cgrp)
+{
+ css->cgroup = cgrp;
+ atomic_set(&css->refcnt, 1);
+ css->flags = 0;
+ css->id = NULL;
+ if (cgrp == dummytop)
+ set_bit(CSS_ROOT, &css->flags);
+ BUG_ON(cgrp->subsys[ss->subsys_id]);
+ cgrp->subsys[ss->subsys_id] = css;
+}
+
+static int cgroup_attach_css(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+ struct cgroup_subsys_state *css;
+ int ret;
+
+ css = ss->create(ss, cgrp);
+ if (IS_ERR(css))
+ return PTR_ERR(css);
+ init_cgroup_css(css, ss, cgrp);
+
+ if (ss->use_id) {
+ ret = alloc_css_id(ss, cgrp->parent, cgrp);
+ if (ret)
+ return ret;
+ }
+ /* At error, ->destroy() callback has to free assigned ID. */
+
+ if (clone_children(cgrp->parent) && ss->post_clone)
+ ss->post_clone(ss, cgrp);
+
+ return 0;
+}
+
+/*
+ * cgroup_walk_hierarchy - iterate through a cgroup hierarchy
+ * @process_cgroup: callback called on each cgroup in the hierarchy
+ * @data: will be passed to @process_cgroup
+ * @top_cgrp: the root cgroup of the hierarchy
+ *
+ * It's a pre-order traversal, so a parent cgroup will be processed before
+ * its children.
+ */
+static int cgroup_walk_hierarchy(int (*process_cgroup)(struct cgroup *, void *),
+ void *data, struct cgroup *top_cgrp)
+{
+ struct cgroup *parent = top_cgrp;
+ struct cgroup *child;
+ struct list_head *node;
+ int ret;
+
+ node = parent->children.next;
+repeat:
+ while (node != &parent->children) {
+ child = list_entry(node, struct cgroup, sibling);
+
+ /* Process this cgroup */
+ ret = process_cgroup(child, data);
+ if (ret)
+ return ret;
+
+ /* Process its children */
+ if (!list_empty(&child->children)) {
+ parent = child;
+ node = parent->children.next;
+ goto repeat;
+ } else
+ node = node->next;
+ }
+
+ /* Process its siblings */
+ if (parent != top_cgrp) {
+ child = parent;
+ parent = child->parent;
+ node = child->sibling.next;
+ goto repeat;
+ }
+
+ return 0;
+}
+
+/*
+ * If hierarchy_attach_css() failed, do some cleanup.
+ */
+static int hierarchy_attach_css_failed(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ if (cgrp->subsys[i]) {
+ subsys[i]->destroy(subsys[i], cgrp);
+ cgrp->subsys[i] = NULL;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Allocate css objects of added subsystems, and attach them to the
+ * existing cgroup.
+ */
+static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+ int ret = 0;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ ret = cgroup_attach_css(subsys[i], cgrp);
+ if (ret)
+ break;
+ }
+
+ if (ret)
+ cgroup_walk_hierarchy(hierarchy_attach_css_failed, data,
+ cgrp->top_cgroup);
+ return ret;
+}
+
+/*
+ * After attaching new css objects to the cgroup, we need to entangle
+ * them into the existing css_sets.
+ */
+static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+ struct cg_cgroup_link *link;
+
+ write_lock(&css_set_lock);
+ list_for_each_entry(link, &cgrp->css_sets, cgrp_link_list) {
+ struct css_set *cg = link->cg;
+ struct hlist_head *hhead;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ rcu_assign_pointer(cg->subsys[i], cgrp->subsys[i]);
+
+ /* rehash */
+ hlist_del(&cg->hlist);
+ hhead = css_set_hash(cg->subsys);
+ hlist_add_head(&cg->hlist, hhead);
+ }
+ write_unlock(&css_set_lock);
+
+ return 0;
+}
+
+/*
+ * Re-populate each cgroup directory.
+ *
+ * Note root cgroup's inode mutex is held.
+ */
+static int hierarchy_populate_dir(struct cgroup *cgrp, void *data)
+{
+ mutex_lock_nested(&cgrp->dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+ cgroup_populate_dir(cgrp);
+ mutex_unlock(&cgrp->dentry->d_inode->i_mutex);
+ return 0;
+}
+
/*
* Call with cgroup_mutex held. Drops reference counts on modules, including
* any duplicate ones that parse_cgroupfs_options took. If this function
@@ -946,36 +1106,59 @@ static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long added_bits, removed_bits;
struct cgroup *cgrp = &root->top_cgroup;
int i;
+ int err;
BUG_ON(!mutex_is_locked(&cgroup_mutex));
removed_bits = root->actual_subsys_bits & ~final_bits;
added_bits = final_bits & ~root->actual_subsys_bits;
+
/* Check that any added subsystems are currently free */
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
- unsigned long bit = 1UL << i;
- struct cgroup_subsys *ss = subsys[i];
- if (!(bit & added_bits))
- continue;
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
/*
* Nobody should tell us to do a subsys that doesn't exist:
* parse_cgroupfs_options should catch that case and refcounts
* ensure that subsystems won't disappear once selected.
*/
- BUG_ON(ss == NULL);
- if (ss->root != &rootnode) {
+ BUG_ON(subsys[i] == NULL);
+ if (subsys[i]->root != &rootnode) {
/* Subsystem isn't free */
return -EBUSY;
}
}
- /* Currently we don't handle adding/removing subsystems when
- * any child cgroups exist. This is theoretically supportable
- * but involves complex error handling, so it's being left until
- * later */
- if (root->number_of_cgroups > 1)
+ /* Removing will be supported later */
+ if (root->number_of_cgroups > 1 && removed_bits)
return -EBUSY;
+ /*
+ * For non-trivial hierarchy, check that added subsystems
+ * are all bindable
+ */
+ if (root->number_of_cgroups > 1) {
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ if (!subsys[i]->bindable)
+ return -EBUSY;
+ }
+
+ /* Attach css objects to the top cgroup */
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ BUG_ON(cgrp->subsys[i]);
+ BUG_ON(!dummytop->subsys[i]);
+ BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
+
+ cgrp->subsys[i] = dummytop->subsys[i];
+ cgrp->subsys[i]->cgroup = cgrp;
+ }
+
+ err = cgroup_walk_hierarchy(hierarchy_attach_css,
+ (void *)added_bits, cgrp);
+ if (err)
+ goto failed;
+
+ cgroup_walk_hierarchy(hierarchy_update_css_sets,
+ (void *)added_bits, cgrp);
+
/* Process each subsystem */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
@@ -983,12 +1166,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
if (bit & added_bits) {
/* We're binding this subsystem to this hierarchy */
BUG_ON(ss == NULL);
- BUG_ON(cgrp->subsys[i]);
- BUG_ON(!dummytop->subsys[i]);
- BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
mutex_lock(&ss->hierarchy_mutex);
- cgrp->subsys[i] = dummytop->subsys[i];
- cgrp->subsys[i]->cgroup = cgrp;
list_move(&ss->sibling, &root->subsys_list);
ss->root = root;
if (ss->bind)
@@ -1001,10 +1179,10 @@ static int rebind_subsystems(struct cgroupfs_root *root,
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
mutex_lock(&ss->hierarchy_mutex);
- if (ss->bind)
- ss->bind(ss, dummytop);
dummytop->subsys[i]->cgroup = dummytop;
cgrp->subsys[i] = NULL;
+ if (ss->bind)
+ ss->bind(ss, dummytop);
subsys[i]->root = &rootnode;
list_move(&ss->sibling, &rootnode.subsys_list);
mutex_unlock(&ss->hierarchy_mutex);
@@ -1031,6 +1209,12 @@ static int rebind_subsystems(struct cgroupfs_root *root,
synchronize_rcu();
return 0;
+
+failed:
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ cgrp->subsys[i] = NULL;
+
+ return err;
}
static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
@@ -1286,6 +1470,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
/* (re)populate subsystem files */
cgroup_populate_dir(cgrp);
+ cgroup_walk_hierarchy(hierarchy_populate_dir, NULL, cgrp);
if (opts.release_agent)
strcpy(root->release_agent_path, opts.release_agent);
@@ -3313,20 +3498,6 @@ static int cgroup_populate_dir(struct cgroup *cgrp)
return 0;
}
-static void init_cgroup_css(struct cgroup_subsys_state *css,
- struct cgroup_subsys *ss,
- struct cgroup *cgrp)
-{
- css->cgroup = cgrp;
- atomic_set(&css->refcnt, 1);
- css->flags = 0;
- css->id = NULL;
- if (cgrp == dummytop)
- set_bit(CSS_ROOT, &css->flags);
- BUG_ON(cgrp->subsys[ss->subsys_id]);
- cgrp->subsys[ss->subsys_id] = css;
-}
-
static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
{
/* We need to take each hierarchy_mutex in a consistent order */
@@ -3401,21 +3572,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
for_each_subsys(root, ss) {
- struct cgroup_subsys_state *css = ss->create(ss, cgrp);
-
- if (IS_ERR(css)) {
- err = PTR_ERR(css);
+ err = cgroup_attach_css(ss, cgrp);
+ if (err)
goto err_destroy;
- }
- init_cgroup_css(css, ss, cgrp);
- if (ss->use_id) {
- err = alloc_css_id(ss, parent, cgrp);
- if (err)
- goto err_destroy;
- }
- /* At error, ->destroy() callback has to free assigned ID. */
- if (clone_children(parent) && ss->post_clone)
- ss->post_clone(ss, cgrp);
}
cgroup_lock_hierarchy(root);
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/6] cgroups: Allow to unbind subsystem from a cgroup hierarchy
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2010-12-15 9:35 ` Li Zefan
2010-12-15 9:35 ` [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy Li Zefan
@ 2010-12-15 9:35 ` Li Zefan
2010-12-15 9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:35 UTC (permalink / raw)
To: Andrew Morton
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
Stephane Eranian, Paul Menage
This allows us to unbind a cgroup subsystem from a hierarchy
which has sub-cgroups in it.
If a subsystem is to support unbinding, when pinning a cgroup
via css refcnt, it should use __css_tryget() instead of css_get().
Usage:
# mount -t cgroup -o cpuset,cpuacct xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks
(remove it from the hierarchy)
# mount -o remount,cpuset xxx /mnt
Changelog v2:
- Allow a cgroup subsystem to use css refcnt.
- Add more code comments.
- Use rcu_assign_pointer() in hierarchy_update_css_sets().
- Split can_bind flag to bindable and unbindable flags.
Signed-off-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
include/linux/cgroup.h | 17 ++++++
kernel/cgroup.c | 139 +++++++++++++++++++++++++++++++++++++++++------
2 files changed, 138 insertions(+), 18 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d8c4e22..17579b2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -110,6 +110,18 @@ static inline bool css_is_removed(struct cgroup_subsys_state *css)
}
/*
+ * For a subsystem which supports unbinding, call this to get css
+ * refcnt. Called with rcu_read_lock or cgroup_mutex held.
+ */
+
+static inline bool __css_tryget(struct cgroup_subsys_state *css)
+{
+ if (test_bit(CSS_ROOT, &css->flags))
+ return true;
+ return atomic_inc_not_zero(&css->refcnt);
+}
+
+/*
* Call css_tryget() to take a reference on a css if your existing
* (known-valid) reference isn't already ref-counted. Returns false if
* the css has been destroyed.
@@ -495,6 +507,11 @@ struct cgroup_subsys {
* which has child cgroups.
*/
bool bindable:1;
+ /*
+ * Indicate if this subsystem can be removed from a cgroup hierarchy
+ * which has child cgroups.
+ */
+ bool unbindable:1;
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index caac80f..463575d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1055,12 +1055,61 @@ static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
}
/*
- * After attaching new css objects to the cgroup, we need to entangle
- * them into the existing css_sets.
+ * Reset those css objects whose refcnts are cleared.
*/
-static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+static int hierarchy_reset_css_refs(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ if (atomic_read(&css->refcnt) == 0)
+ atomic_set(&css->refcnt, 1);
+ }
+ return 0;
+}
+
+/*
+ * Clear all the css objects' refcnt to 0. If there's a refcnt > 1,
+ * return failure.
+ */
+static int hierarchy_clear_css_refs(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ struct cgroup_subsys_state *css = cgrp->subsys[i];
+
+ if (atomic_cmpxchg(&css->refcnt, 1, 0) != 1)
+ goto failed;
+ }
+ return 0;
+failed:
+ hierarchy_reset_css_refs(struct cgroup *cgrp, void *data);
+ return -EBUSY;
+}
+
+/*
+ * We're removing some subsystems from cgroup hierarchy, and here we
+ * remove and destroy the css objects from each cgroup.
+ */
+static int hierarchy_remove_css(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ subsys[i]->destroy(subsys[i], cgrp);
+ cgrp->subsys[i] = NULL;
+ }
+
+ return 0;
+}
+
+static int hierarchy_update_css_sets(struct cgroup *cgrp,
+ unsigned long bits, bool add)
{
- unsigned long added_bits = (unsigned long)data;
int i;
struct cg_cgroup_link *link;
@@ -1069,8 +1118,14 @@ static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
struct css_set *cg = link->cg;
struct hlist_head *hhead;
- for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
- rcu_assign_pointer(cg->subsys[i], cgrp->subsys[i]);
+ for_each_set_bit(i, &bits, CGROUP_SUBSYS_COUNT) {
+ if (add)
+ rcu_assign_pointer(cg->subsys[i],
+ cgrp->subsys[i]);
+ else
+ rcu_assign_pointer(cg->subsys[i],
+ dummytop->subsys[i]);
+ }
/* rehash */
hlist_del(&cg->hlist);
@@ -1083,6 +1138,30 @@ static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
}
/*
+ * After attaching new css objects to the cgroup, we need to entangle
+ * them into the existing css_sets.
+ */
+static int hierarchy_add_to_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+
+ hierarchy_update_css_sets(cgrp, added_bits, true);
+ return 0;
+}
+
+/*
+ * Before dettaching and destroying css objects from the cgroup, we
+ * should detangle them from the existing css_sets.
+ */
+static int hierarchy_remove_from_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+
+ hierarchy_update_css_sets(cgrp, removed_bits, false);
+ return 0;
+}
+
+/*
* Re-populate each cgroup directory.
*
* Note root cgroup's inode mutex is held.
@@ -1127,18 +1206,17 @@ static int rebind_subsystems(struct cgroupfs_root *root,
}
}
- /* Removing will be supported later */
- if (root->number_of_cgroups > 1 && removed_bits)
- return -EBUSY;
-
/*
* For non-trivial hierarchy, check that added subsystems
- * are all bindable
+ * are all bindable and removed subsystems are all unbindable
*/
if (root->number_of_cgroups > 1) {
for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
if (!subsys[i]->bindable)
return -EBUSY;
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT)
+ if (!subsys[i]->unbindable)
+ return -EBUSY;
}
/* Attach css objects to the top cgroup */
@@ -1154,9 +1232,14 @@ static int rebind_subsystems(struct cgroupfs_root *root,
err = cgroup_walk_hierarchy(hierarchy_attach_css,
(void *)added_bits, cgrp);
if (err)
- goto failed;
+ goto out;
+
+ err = cgroup_walk_hierarchy(hierarchy_clear_css_refs,
+ (void *)removed_bits, cgrp);
+ if (err)
+ goto out_remove_css;
- cgroup_walk_hierarchy(hierarchy_update_css_sets,
+ cgroup_walk_hierarchy(hierarchy_add_to_css_sets,
(void *)added_bits, cgrp);
/* Process each subsystem */
@@ -1176,11 +1259,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
} else if (bit & removed_bits) {
/* We're removing this subsystem */
BUG_ON(ss == NULL);
- BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
- BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
mutex_lock(&ss->hierarchy_mutex);
- dummytop->subsys[i]->cgroup = dummytop;
- cgrp->subsys[i] = NULL;
if (ss->bind)
ss->bind(ss, dummytop);
subsys[i]->root = &rootnode;
@@ -1206,11 +1285,35 @@ static int rebind_subsystems(struct cgroupfs_root *root,
}
}
root->subsys_bits = root->actual_subsys_bits = final_bits;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
+ BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
+
+ dummytop->subsys[i]->cgroup = dummytop;
+ cgrp->subsys[i] = NULL;
+ }
+
+ cgroup_walk_hierarchy(hierarchy_remove_from_css_sets,
+ (void *)removed_bits, cgrp);
+
+ /*
+ * There might be some pointers to the cgrouip_subsys_state
+ * that we are going to destroy.
+ */
+ synchronize_rcu();
+
+ cgroup_walk_hierarchy(hierarchy_remove_css,
+ (void *)removed_bits, cgrp);
+
synchronize_rcu();
return 0;
-failed:
+out_remove_css:
+ cgroup_walk_hierarchy(hierarchy_remove_css,
+ (void *)added_bits, cgrp);
+out:
for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
cgrp->subsys[i] = NULL;
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/6] cgroups: Allow to unbind subsystem from a cgroup hierarchy
2010-12-15 9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
` (2 preceding siblings ...)
2010-12-15 9:35 ` [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy Li Zefan
@ 2010-12-15 9:35 ` Li Zefan
2010-12-15 9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
` (2 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:35 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
Stephane Eranian, LKML, containers
This allows us to unbind a cgroup subsystem from a hierarchy
which has sub-cgroups in it.
If a subsystem is to support unbinding, when pinning a cgroup
via css refcnt, it should use __css_tryget() instead of css_get().
Usage:
# mount -t cgroup -o cpuset,cpuacct xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks
(remove it from the hierarchy)
# mount -o remount,cpuset xxx /mnt
Changelog v2:
- Allow a cgroup subsystem to use css refcnt.
- Add more code comments.
- Use rcu_assign_pointer() in hierarchy_update_css_sets().
- Split can_bind flag to bindable and unbindable flags.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/cgroup.h | 17 ++++++
kernel/cgroup.c | 139 +++++++++++++++++++++++++++++++++++++++++------
2 files changed, 138 insertions(+), 18 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d8c4e22..17579b2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -110,6 +110,18 @@ static inline bool css_is_removed(struct cgroup_subsys_state *css)
}
/*
+ * For a subsystem which supports unbinding, call this to get css
+ * refcnt. Called with rcu_read_lock or cgroup_mutex held.
+ */
+
+static inline bool __css_tryget(struct cgroup_subsys_state *css)
+{
+ if (test_bit(CSS_ROOT, &css->flags))
+ return true;
+ return atomic_inc_not_zero(&css->refcnt);
+}
+
+/*
* Call css_tryget() to take a reference on a css if your existing
* (known-valid) reference isn't already ref-counted. Returns false if
* the css has been destroyed.
@@ -495,6 +507,11 @@ struct cgroup_subsys {
* which has child cgroups.
*/
bool bindable:1;
+ /*
+ * Indicate if this subsystem can be removed from a cgroup hierarchy
+ * which has child cgroups.
+ */
+ bool unbindable:1;
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index caac80f..463575d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1055,12 +1055,61 @@ static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
}
/*
- * After attaching new css objects to the cgroup, we need to entangle
- * them into the existing css_sets.
+ * Reset those css objects whose refcnts are cleared.
*/
-static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+static int hierarchy_reset_css_refs(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ if (atomic_read(&css->refcnt) == 0)
+ atomic_set(&css->refcnt, 1);
+ }
+ return 0;
+}
+
+/*
+ * Clear all the css objects' refcnt to 0. If there's a refcnt > 1,
+ * return failure.
+ */
+static int hierarchy_clear_css_refs(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ struct cgroup_subsys_state *css = cgrp->subsys[i];
+
+ if (atomic_cmpxchg(&css->refcnt, 1, 0) != 1)
+ goto failed;
+ }
+ return 0;
+failed:
+ hierarchy_reset_css_refs(struct cgroup *cgrp, void *data);
+ return -EBUSY;
+}
+
+/*
+ * We're removing some subsystems from cgroup hierarchy, and here we
+ * remove and destroy the css objects from each cgroup.
+ */
+static int hierarchy_remove_css(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ subsys[i]->destroy(subsys[i], cgrp);
+ cgrp->subsys[i] = NULL;
+ }
+
+ return 0;
+}
+
+static int hierarchy_update_css_sets(struct cgroup *cgrp,
+ unsigned long bits, bool add)
{
- unsigned long added_bits = (unsigned long)data;
int i;
struct cg_cgroup_link *link;
@@ -1069,8 +1118,14 @@ static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
struct css_set *cg = link->cg;
struct hlist_head *hhead;
- for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
- rcu_assign_pointer(cg->subsys[i], cgrp->subsys[i]);
+ for_each_set_bit(i, &bits, CGROUP_SUBSYS_COUNT) {
+ if (add)
+ rcu_assign_pointer(cg->subsys[i],
+ cgrp->subsys[i]);
+ else
+ rcu_assign_pointer(cg->subsys[i],
+ dummytop->subsys[i]);
+ }
/* rehash */
hlist_del(&cg->hlist);
@@ -1083,6 +1138,30 @@ static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
}
/*
+ * After attaching new css objects to the cgroup, we need to entangle
+ * them into the existing css_sets.
+ */
+static int hierarchy_add_to_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+
+ hierarchy_update_css_sets(cgrp, added_bits, true);
+ return 0;
+}
+
+/*
+ * Before dettaching and destroying css objects from the cgroup, we
+ * should detangle them from the existing css_sets.
+ */
+static int hierarchy_remove_from_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+
+ hierarchy_update_css_sets(cgrp, removed_bits, false);
+ return 0;
+}
+
+/*
* Re-populate each cgroup directory.
*
* Note root cgroup's inode mutex is held.
@@ -1127,18 +1206,17 @@ static int rebind_subsystems(struct cgroupfs_root *root,
}
}
- /* Removing will be supported later */
- if (root->number_of_cgroups > 1 && removed_bits)
- return -EBUSY;
-
/*
* For non-trivial hierarchy, check that added subsystems
- * are all bindable
+ * are all bindable and removed subsystems are all unbindable
*/
if (root->number_of_cgroups > 1) {
for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
if (!subsys[i]->bindable)
return -EBUSY;
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT)
+ if (!subsys[i]->unbindable)
+ return -EBUSY;
}
/* Attach css objects to the top cgroup */
@@ -1154,9 +1232,14 @@ static int rebind_subsystems(struct cgroupfs_root *root,
err = cgroup_walk_hierarchy(hierarchy_attach_css,
(void *)added_bits, cgrp);
if (err)
- goto failed;
+ goto out;
+
+ err = cgroup_walk_hierarchy(hierarchy_clear_css_refs,
+ (void *)removed_bits, cgrp);
+ if (err)
+ goto out_remove_css;
- cgroup_walk_hierarchy(hierarchy_update_css_sets,
+ cgroup_walk_hierarchy(hierarchy_add_to_css_sets,
(void *)added_bits, cgrp);
/* Process each subsystem */
@@ -1176,11 +1259,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
} else if (bit & removed_bits) {
/* We're removing this subsystem */
BUG_ON(ss == NULL);
- BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
- BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
mutex_lock(&ss->hierarchy_mutex);
- dummytop->subsys[i]->cgroup = dummytop;
- cgrp->subsys[i] = NULL;
if (ss->bind)
ss->bind(ss, dummytop);
subsys[i]->root = &rootnode;
@@ -1206,11 +1285,35 @@ static int rebind_subsystems(struct cgroupfs_root *root,
}
}
root->subsys_bits = root->actual_subsys_bits = final_bits;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
+ BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
+
+ dummytop->subsys[i]->cgroup = dummytop;
+ cgrp->subsys[i] = NULL;
+ }
+
+ cgroup_walk_hierarchy(hierarchy_remove_from_css_sets,
+ (void *)removed_bits, cgrp);
+
+ /*
+ * There might be some pointers to the cgrouip_subsys_state
+ * that we are going to destroy.
+ */
+ synchronize_rcu();
+
+ cgroup_walk_hierarchy(hierarchy_remove_css,
+ (void *)removed_bits, cgrp);
+
synchronize_rcu();
return 0;
-failed:
+out_remove_css:
+ cgroup_walk_hierarchy(hierarchy_remove_css,
+ (void *)added_bits, cgrp);
+out:
for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
cgrp->subsys[i] = NULL;
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (2 preceding siblings ...)
2010-12-15 9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
@ 2010-12-15 9:36 ` Li Zefan
2010-12-15 9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
2010-12-15 9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan
5 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
Stephane Eranian, Paul Menage, Ingo Molnar
For those subsystems (debug, cpuacct, net_cls and devices),
setting the bindable/unbindable flag is sufficient.
Set freezer subsystem as bindable but not unbindable, because
sub-cgroups' can be in FROZEN state.
Signed-off-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 6 +++++-
kernel/cgroup_freezer.c | 1 +
kernel/sched.c | 2 ++
security/device_cgroup.c | 2 ++
4 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 463575d..fa2c5de 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1063,6 +1063,8 @@ static int hierarchy_reset_css_refs(struct cgroup *cgrp, void *data)
int i;
for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ struct cgroup_subsys_state *css = cgrp->subsys[i];
+
if (atomic_read(&css->refcnt) == 0)
atomic_set(&css->refcnt, 1);
}
@@ -1086,7 +1088,7 @@ static int hierarchy_clear_css_refs(struct cgroup *cgrp, void *data)
}
return 0;
failed:
- hierarchy_reset_css_refs(struct cgroup *cgrp, void *data);
+ hierarchy_reset_css_refs(cgrp, data);
return -EBUSY;
}
@@ -5201,5 +5203,7 @@ struct cgroup_subsys debug_subsys = {
.destroy = debug_destroy,
.populate = debug_populate,
.subsys_id = debug_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};
#endif /* CONFIG_CGROUP_DEBUG */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e7bebb7..213ecd9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -393,4 +393,5 @@ struct cgroup_subsys freezer_subsys = {
.attach = NULL,
.fork = freezer_fork,
.exit = NULL,
+ .bindable = true,
};
diff --git a/kernel/sched.c b/kernel/sched.c
index dc91a4d..930ee2e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9346,6 +9346,8 @@ struct cgroup_subsys cpuacct_subsys = {
.destroy = cpuacct_destroy,
.populate = cpuacct_populate,
.subsys_id = cpuacct_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};
#endif /* CONFIG_CGROUP_CPUACCT */
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 8d9c48f..51321e9 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -473,6 +473,8 @@ struct cgroup_subsys devices_subsys = {
.destroy = devcgroup_destroy,
.populate = devcgroup_populate,
.subsys_id = devices_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};
int devcgroup_inode_permission(struct inode *inode, int mask)
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable
2010-12-15 9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
` (3 preceding siblings ...)
2010-12-15 9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
@ 2010-12-15 9:36 ` Li Zefan
2010-12-15 9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
2010-12-15 9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan
6 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
Stephane Eranian, LKML, containers, Ingo Molnar, serge
For those subsystems (debug, cpuacct, net_cls and devices),
setting the bindable/unbindable flag is sufficient.
Set freezer subsystem as bindable but not unbindable, because
sub-cgroups' can be in FROZEN state.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
kernel/cgroup.c | 6 +++++-
kernel/cgroup_freezer.c | 1 +
kernel/sched.c | 2 ++
security/device_cgroup.c | 2 ++
4 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 463575d..fa2c5de 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1063,6 +1063,8 @@ static int hierarchy_reset_css_refs(struct cgroup *cgrp, void *data)
int i;
for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ struct cgroup_subsys_state *css = cgrp->subsys[i];
+
if (atomic_read(&css->refcnt) == 0)
atomic_set(&css->refcnt, 1);
}
@@ -1086,7 +1088,7 @@ static int hierarchy_clear_css_refs(struct cgroup *cgrp, void *data)
}
return 0;
failed:
- hierarchy_reset_css_refs(struct cgroup *cgrp, void *data);
+ hierarchy_reset_css_refs(cgrp, data);
return -EBUSY;
}
@@ -5201,5 +5203,7 @@ struct cgroup_subsys debug_subsys = {
.destroy = debug_destroy,
.populate = debug_populate,
.subsys_id = debug_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};
#endif /* CONFIG_CGROUP_DEBUG */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e7bebb7..213ecd9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -393,4 +393,5 @@ struct cgroup_subsys freezer_subsys = {
.attach = NULL,
.fork = freezer_fork,
.exit = NULL,
+ .bindable = true,
};
diff --git a/kernel/sched.c b/kernel/sched.c
index dc91a4d..930ee2e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9346,6 +9346,8 @@ struct cgroup_subsys cpuacct_subsys = {
.destroy = cpuacct_destroy,
.populate = cpuacct_populate,
.subsys_id = cpuacct_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};
#endif /* CONFIG_CGROUP_CPUACCT */
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 8d9c48f..51321e9 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -473,6 +473,8 @@ struct cgroup_subsys devices_subsys = {
.destroy = devcgroup_destroy,
.populate = devcgroup_populate,
.subsys_id = devices_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};
int devcgroup_inode_permission(struct inode *inode, int mask)
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get()
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (3 preceding siblings ...)
2010-12-15 9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
@ 2010-12-15 9:36 ` Li Zefan
2010-12-15 9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan
5 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
Stephane Eranian, Paul Menage
For now unbindable subsystems should not use css_get/put(), so check
this misuse.
Signed-off-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
include/linux/cgroup.h | 7 +++++--
kernel/cgroup.c | 5 +++++
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 17579b2..e8ad9f1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -80,13 +80,15 @@ struct cgroup_subsys_state {
/* bits in struct cgroup_subsys_state flags field */
enum {
- CSS_ROOT, /* This CSS is the root of the subsystem */
- CSS_REMOVED, /* This CSS is dead */
+ CSS_ROOT, /* This CSS is the root of the subsystem */
+ CSS_REMOVED, /* This CSS is dead */
+ CSS_NO_GET, /* Forbid calling css_get/put() */
};
/* Caller must verify that the css is not for root cgroup */
static inline void __css_get(struct cgroup_subsys_state *css, int count)
{
+ BUG_ON(test_bit(CSS_NO_GET, &css->flags));
atomic_add(count, &css->refcnt);
}
@@ -131,6 +133,7 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
{
if (test_bit(CSS_ROOT, &css->flags))
return true;
+ BUG_ON(test_bit(CSS_NO_GET, &css->flags));
while (!atomic_inc_not_zero(&css->refcnt)) {
if (test_bit(CSS_REMOVED, &css->flags))
return false;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index fa2c5de..d49a459 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -938,6 +938,11 @@ static void init_cgroup_css(struct cgroup_subsys_state *css,
atomic_set(&css->refcnt, 1);
css->flags = 0;
css->id = NULL;
+
+ /* For now, unbindable subsystems should not call css_get/put(). */
+ if (ss->unbindable)
+ set_bit(CSS_NO_GET, &css->flags);
+
if (cgrp == dummytop)
set_bit(CSS_ROOT, &css->flags);
BUG_ON(cgrp->subsys[ss->subsys_id]);
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get()
2010-12-15 9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
` (4 preceding siblings ...)
2010-12-15 9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
@ 2010-12-15 9:36 ` Li Zefan
2010-12-15 9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan
6 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
Stephane Eranian, LKML, containers
For now unbindable subsystems should not use css_get/put(), so check
this misuse.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/cgroup.h | 7 +++++--
kernel/cgroup.c | 5 +++++
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 17579b2..e8ad9f1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -80,13 +80,15 @@ struct cgroup_subsys_state {
/* bits in struct cgroup_subsys_state flags field */
enum {
- CSS_ROOT, /* This CSS is the root of the subsystem */
- CSS_REMOVED, /* This CSS is dead */
+ CSS_ROOT, /* This CSS is the root of the subsystem */
+ CSS_REMOVED, /* This CSS is dead */
+ CSS_NO_GET, /* Forbid calling css_get/put() */
};
/* Caller must verify that the css is not for root cgroup */
static inline void __css_get(struct cgroup_subsys_state *css, int count)
{
+ BUG_ON(test_bit(CSS_NO_GET, &css->flags));
atomic_add(count, &css->refcnt);
}
@@ -131,6 +133,7 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
{
if (test_bit(CSS_ROOT, &css->flags))
return true;
+ BUG_ON(test_bit(CSS_NO_GET, &css->flags));
while (!atomic_inc_not_zero(&css->refcnt)) {
if (test_bit(CSS_REMOVED, &css->flags))
return false;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index fa2c5de..d49a459 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -938,6 +938,11 @@ static void init_cgroup_css(struct cgroup_subsys_state *css,
atomic_set(&css->refcnt, 1);
css->flags = 0;
css->id = NULL;
+
+ /* For now, unbindable subsystems should not call css_get/put(). */
+ if (ss->unbindable)
+ set_bit(CSS_NO_GET, &css->flags);
+
if (cgrp == dummytop)
set_bit(CSS_ROOT, &css->flags);
BUG_ON(cgrp->subsys[ss->subsys_id]);
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (4 preceding siblings ...)
2010-12-15 9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
@ 2010-12-15 9:36 ` Li Zefan
5 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
Stephane Eranian, Paul Menage
Provide a usage example, update the bind() callback API, etc.
Signed-off-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
Documentation/cgroups/cgroups.txt | 37 +++++++++++++++++++++++++++++--------
1 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 190018b..4e772cc 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -363,17 +363,23 @@ Note this will add ns to the hierarchy but won't remove memory or
cpuset, because the new options are appended to the old ones:
# mount -o remount,ns /dev/cgroup
+For some subsystems you can bind them to a mounted hierarchy or
+remove them from it, even if there're sub-cgroups in it:
+# mount -t cgroup -o freezer hier1 /dev/cgroup
+# echo $$ > /dev/cgroup/my_cgroup
+# mount -o freezer,cpuset hier1 /dev/cgroup
+(failed)
+# mount -o freezer,cpuacct hier1 /dev/cgroup
+# mount -o cpuacct hier1 /dev/cgroup
+
+Note cpuacct should be sit in the default hierarchy before remount.
+
To Specify a hierarchy's release_agent:
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
xxx /dev/cgroup
Note that specifying 'release_agent' more than once will return failure.
-Note that changing the set of subsystems is currently only supported
-when the hierarchy consists of a single (root) cgroup. Supporting
-the ability to arbitrarily bind/unbind subsystems from an existing
-cgroup hierarchy is intended to be implemented in the future.
-
Then under /dev/cgroup you can find a tree that corresponds to the
tree of the cgroups in the system. For instance, /dev/cgroup
is the cgroup that holds the whole system.
@@ -523,6 +529,15 @@ module initcall a call to cgroup_load_subsys(), and in its exitcall a
call to cgroup_unload_subsys(). It should also set its_subsys.module =
THIS_MODULE in its .c file.
+If a subsystem has bindable flag set, normally it has to be able to
+support side-effect free movement of a task into any just-created
+cgroups. i.e. it's probably not suitable for any subsystem where
+can_attach() might return false for the newly-created cgroup, or
+attach() might have side-effects for those same cases.
+
+If a subsystem has unbindable flag set, normally it has to be able to
+support side-effect free movement of a task into the roog cgroup.
+
Each subsystem may export the following methods. The only mandatory
methods are create/destroy. Any others that are null are presumed to
be successful no-ops.
@@ -627,9 +642,15 @@ void bind(struct cgroup_subsys *ss, struct cgroup *root)
(cgroup_mutex and ss->hierarchy_mutex held by caller)
Called when a cgroup subsystem is rebound to a different hierarchy
-and root cgroup. Currently this will only involve movement between
-the default hierarchy (which never has sub-cgroups) and a hierarchy
-that is being created/destroyed (and hence has no sub-cgroups).
+and root cgroup.
+
+For non-bindable subsystems, this will only involve movement
+between the default hierarchy (which never has sub-cgroups) and a
+hierarchy that is being created/destroyed (and hence has no sub-cgroups).
+
+For binadable subsystems, this may also involve movement between the
+default hierarchy and a mounted hierarchy that's populated with
+sub-cgroups.
4. Questions
============
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems
2010-12-15 9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
` (5 preceding siblings ...)
2010-12-15 9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
@ 2010-12-15 9:36 ` Li Zefan
6 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2010-12-15 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Paul Menage, Peter Zijlstra, Hiroyuki KAMEZAWA, Matt Helsley,
Stephane Eranian, LKML, containers
Provide a usage example, update the bind() callback API, etc.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
Documentation/cgroups/cgroups.txt | 37 +++++++++++++++++++++++++++++--------
1 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 190018b..4e772cc 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -363,17 +363,23 @@ Note this will add ns to the hierarchy but won't remove memory or
cpuset, because the new options are appended to the old ones:
# mount -o remount,ns /dev/cgroup
+For some subsystems you can bind them to a mounted hierarchy or
+remove them from it, even if there're sub-cgroups in it:
+# mount -t cgroup -o freezer hier1 /dev/cgroup
+# echo $$ > /dev/cgroup/my_cgroup
+# mount -o freezer,cpuset hier1 /dev/cgroup
+(failed)
+# mount -o freezer,cpuacct hier1 /dev/cgroup
+# mount -o cpuacct hier1 /dev/cgroup
+
+Note cpuacct should be sit in the default hierarchy before remount.
+
To Specify a hierarchy's release_agent:
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
xxx /dev/cgroup
Note that specifying 'release_agent' more than once will return failure.
-Note that changing the set of subsystems is currently only supported
-when the hierarchy consists of a single (root) cgroup. Supporting
-the ability to arbitrarily bind/unbind subsystems from an existing
-cgroup hierarchy is intended to be implemented in the future.
-
Then under /dev/cgroup you can find a tree that corresponds to the
tree of the cgroups in the system. For instance, /dev/cgroup
is the cgroup that holds the whole system.
@@ -523,6 +529,15 @@ module initcall a call to cgroup_load_subsys(), and in its exitcall a
call to cgroup_unload_subsys(). It should also set its_subsys.module =
THIS_MODULE in its .c file.
+If a subsystem has bindable flag set, normally it has to be able to
+support side-effect free movement of a task into any just-created
+cgroups. i.e. it's probably not suitable for any subsystem where
+can_attach() might return false for the newly-created cgroup, or
+attach() might have side-effects for those same cases.
+
+If a subsystem has unbindable flag set, normally it has to be able to
+support side-effect free movement of a task into the roog cgroup.
+
Each subsystem may export the following methods. The only mandatory
methods are create/destroy. Any others that are null are presumed to
be successful no-ops.
@@ -627,9 +642,15 @@ void bind(struct cgroup_subsys *ss, struct cgroup *root)
(cgroup_mutex and ss->hierarchy_mutex held by caller)
Called when a cgroup subsystem is rebound to a different hierarchy
-and root cgroup. Currently this will only involve movement between
-the default hierarchy (which never has sub-cgroups) and a hierarchy
-that is being created/destroyed (and hence has no sub-cgroups).
+and root cgroup.
+
+For non-bindable subsystems, this will only involve movement
+between the default hierarchy (which never has sub-cgroups) and a
+hierarchy that is being created/destroyed (and hence has no sub-cgroups).
+
+For binadable subsystems, this may also involve movement between the
+default hierarchy and a mounted hierarchy that's populated with
+sub-cgroups.
4. Questions
============
--
1.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-12-15 9:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-15 9:34 [PATCH v2 0/6] cgroups: Bindable cgroup subsystems Li Zefan
2010-12-15 9:35 ` [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys Li Zefan
[not found] ` <4D088BB5.30903-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2010-12-15 9:35 ` Li Zefan
2010-12-15 9:35 ` [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy Li Zefan
2010-12-15 9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
2010-12-15 9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
2010-12-15 9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
2010-12-15 9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan
2010-12-15 9:35 ` [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy Li Zefan
2010-12-15 9:35 ` [PATCH v2 3/6] cgroups: Allow to unbind subsystem from " Li Zefan
2010-12-15 9:36 ` [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable Li Zefan
2010-12-15 9:36 ` [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get() Li Zefan
2010-12-15 9:36 ` [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems Li Zefan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.