* [RFC PATCH 0/9] Add container support for cgroup
@ 2012-12-17 6:43 Gao feng
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
0 siblings, 1 reply; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
Right now,if we mount cgroup in the container,we will get
host's cgroup informations and even we can change host's
cgroup in container.
So the resource controller of the container will lose
effectiveness.
This patchset try to add contianer support for cgroup.
the main idea is allocateing cgroup super-block for each
cgroup mounted in different pid namespace.
The top cgroup of container will share css with host.
When the cgroup being mounted in contianer,the tasks in
this container will be attached to this new mounted
hierarchy's top cgroup, And when unmounting cgroup in
container,these tasks will be attached back to host's cgroup.
Since the container can change the shared css through it's
cgroup subsystem files. patch 7/8 disable the write permission
of container's top cgroup files. In my TODO list, container
will have it's own css, this problem will disappear.
This patchset is sent as RFC,any comments are welcome.
Maybe this isn't the best solution, if you have better
solution,Please let me know.
Gao feng (9):
cgroup: introduce cgroupfs_root flag ROOT_NAMESPACE
cgroup: introduce the top root
cgroup: use root->top_root instead of root
introduce helper function cgroup_in_root
cgroup: add container support for cgroup
pidns: move next_tgid to kernel/pid.c
cgroup: attach container's tasks to proper cgroup
cgroup: disallow container to change top cgroup's subsys files
cgroup: rework cgroup_path
fs/proc/base.c | 43 ------
include/linux/sched.h | 8 +
kernel/cgroup.c | 344 ++++++++++++++++++++++++++++++++++++++----------
kernel/pid.c | 39 ++++++
4 files changed, 319 insertions(+), 115 deletions(-)
--
1.7.7.6
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH 1/9] cgroup: introduce cgroupfs_root flag ROOT_NAMESPACE
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 6:43 ` [RFC PATCH 2/9] cgroup: introduce the top root Gao feng
` (10 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
1. add a new flag bit ROOT_NAMESPACE for cgroupfs_root,
this bit is used for identifying whether this hierarchy
is mounted in container(pid namespace).
2. add a field pid_ns in cgroupfs_root,points to which
pidns this hierarchy belongs to.
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f34c41b..7d095b7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -129,6 +129,9 @@ struct cgroupfs_root {
/* Tracks how many cgroups are currently defined in hierarchy.*/
int number_of_cgroups;
+ /* The pid namespace this hierarchy belongs to */
+ struct pid_namespace *pid_ns;
+
/* A list running through the active hierarchies */
struct list_head root_list;
@@ -153,7 +156,9 @@ struct cgroupfs_root {
* subsystems that are otherwise unattached - it never has more than a
* single cgroup, and all tasks are part of that cgroup.
*/
-static struct cgroupfs_root rootnode;
+static struct cgroupfs_root rootnode = {
+ .pid_ns = &init_pid_ns,
+};
/*
* cgroupfs file entry, pointed to from leaf dentry->d_fsdata.
@@ -286,6 +291,7 @@ inline int cgroup_is_removed(const struct cgroup *cgrp)
enum {
ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
ROOT_XATTR, /* supports extended attributes */
+ ROOT_NAMESPACE, /* mounted in container */
};
static int cgroup_is_releasable(const struct cgroup *cgrp)
@@ -1100,6 +1106,8 @@ static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry)
seq_puts(seq, ",noprefix");
if (test_bit(ROOT_XATTR, &root->flags))
seq_puts(seq, ",xattr");
+ if (test_bit(ROOT_NAMESPACE, &root->flags))
+ seq_puts(seq, ",namespace");
if (strlen(root->release_agent_path))
seq_printf(seq, ",release_agent=%s", root->release_agent_path);
if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->top_cgroup.flags))
@@ -1342,7 +1350,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
removed_mask = root->subsys_mask & ~opts.subsys_mask;
/* Don't allow flags or name to change at remount */
- if (opts.flags != root->flags ||
+ if (opts.flags != (root->flags & ~(1UL << ROOT_NAMESPACE)) ||
(opts.name && strcmp(opts.name, root->name))) {
ret = -EINVAL;
drop_parsed_module_refcounts(opts.subsys_mask);
@@ -1479,6 +1487,10 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts)
root->subsys_mask = opts->subsys_mask;
root->flags = opts->flags;
ida_init(&root->cgroup_ida);
+ root->pid_ns = get_pid_ns(task_active_pid_ns(current));
+ if (root->pid_ns != &init_pid_ns)
+ set_bit(ROOT_NAMESPACE, &root->flags);
+
if (opts->release_agent)
strcpy(root->release_agent_path, opts->release_agent);
if (opts->name)
@@ -1498,6 +1510,7 @@ static void cgroup_drop_root(struct cgroupfs_root *root)
ida_remove(&hierarchy_ida, root->hierarchy_id);
spin_unlock(&hierarchy_id_lock);
ida_destroy(&root->cgroup_ida);
+ put_pid_ns(root->pid_ns);
kfree(root);
}
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 2/9] cgroup: introduce the top root
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2012-12-17 6:43 ` [RFC PATCH 1/9] cgroup: introduce cgroupfs_root flag ROOT_NAMESPACE Gao feng
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 6:43 ` [RFC PATCH 3/9] cgroup: use root->top_root instead of root Gao feng
` (9 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
We will allow container to allocate new hierarchy,
and because the cgroup subsystems only can be linked
with one cgroupfs_root. so it's need to add a new field
top_root for cgroupfs_root to point out which subsystems
binds to this hierarchy.
Add a helper function find_top_root to help find out the
top_root.
also rename root_count to top_root_count.
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 29 ++++++++++++++++++++++++-----
1 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7d095b7..27ebeaf 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -132,6 +132,9 @@ struct cgroupfs_root {
/* The pid namespace this hierarchy belongs to */
struct pid_namespace *pid_ns;
+ /* The top root,subsystem only links with top root */
+ struct cgroupfs_root *top_root;
+
/* A list running through the active hierarchies */
struct list_head root_list;
@@ -234,7 +237,7 @@ struct cgroup_event {
/* The list of hierarchy roots */
static LIST_HEAD(roots);
-static int root_count;
+static int top_root_count;
static DEFINE_IDA(hierarchy_ida);
static int next_hierarchy_id;
@@ -682,7 +685,7 @@ static struct css_set *find_css_set(
return NULL;
/* Allocate all the cg_cgroup_link objects that we'll need */
- if (allocate_cg_links(root_count, &tmp_cg_links) < 0) {
+ if (allocate_cg_links(top_root_count, &tmp_cg_links) < 0) {
kfree(res);
return NULL;
}
@@ -847,6 +850,20 @@ static struct backing_dev_info cgroup_backing_dev_info = {
static int alloc_css_id(struct cgroup_subsys *ss,
struct cgroup *parent, struct cgroup *child);
+/*
+ * Find out the top root by subsys_mask.
+ */
+static struct cgroupfs_root *find_top_root(unsigned long subsys_mask)
+{
+ struct cgroupfs_root *existing_root;
+ for_each_active_root(existing_root) {
+ if (!test_bit(ROOT_NAMESPACE, &existing_root->flags) &&
+ existing_root->subsys_mask == subsys_mask)
+ return existing_root;
+ }
+ return NULL;
+}
+
static struct inode *cgroup_new_inode(umode_t mode, struct super_block *sb)
{
struct inode *inode = new_inode(sb);
@@ -1416,6 +1433,7 @@ static void init_cgroup_root(struct cgroupfs_root *root)
INIT_LIST_HEAD(&root->root_list);
INIT_LIST_HEAD(&root->allcg_list);
root->number_of_cgroups = 1;
+ root->top_root = root;
cgrp->root = root;
cgrp->top_cgroup = cgrp;
init_cgroup_housekeeping(cgrp);
@@ -1656,7 +1674,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
BUG_ON(ret);
list_add(&root->root_list, &roots);
- root_count++;
+ top_root_count++;
sb->s_root->d_fsdata = root_cgrp;
root->top_cgroup.dentry = sb->s_root;
@@ -1749,7 +1767,8 @@ static void cgroup_kill_sb(struct super_block *sb) {
if (!list_empty(&root->root_list)) {
list_del(&root->root_list);
- root_count--;
+ if (!test_bit(ROOT_NAMESPACE, &root->flags))
+ top_root_count--;
}
mutex_unlock(&cgroup_root_mutex);
@@ -4635,7 +4654,7 @@ int __init cgroup_init_early(void)
INIT_HLIST_NODE(&init_css_set.hlist);
css_set_count = 1;
init_cgroup_root(&rootnode);
- root_count = 1;
+ top_root_count = 1;
init_task.cgroups = &init_css_set;
init_css_set_link.cg = &init_css_set;
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 3/9] cgroup: use root->top_root instead of root
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2012-12-17 6:43 ` [RFC PATCH 1/9] cgroup: introduce cgroupfs_root flag ROOT_NAMESPACE Gao feng
2012-12-17 6:43 ` [RFC PATCH 2/9] cgroup: introduce the top root Gao feng
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 6:43 ` [RFC PATCH 4/9] introduce helper function cgroup_in_root Gao feng
` (8 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
After adding container support for cgroup,the subsystems
can bind to many hierarchies.the root of cgroups will be
different even these roots have same subsys_mask.
Because these roots' top_root will always be same.use
top_root instead of root.
Also link cgroup's allcg_node to top_root's allcg_list.
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 15 ++++++++-------
1 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 27ebeaf..2e14c8f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -315,7 +315,7 @@ static int notify_on_release(const struct cgroup *cgrp)
* an active hierarchy
*/
#define for_each_subsys(_root, _ss) \
-list_for_each_entry(_ss, &_root->subsys_list, sibling)
+list_for_each_entry(_ss, &_root->top_root->subsys_list, sibling)
/* for_each_active_root() allows you to iterate across the active hierarchies */
#define for_each_active_root(_root) \
@@ -519,7 +519,7 @@ static bool compare_css_sets(struct css_set *cg,
cg1 = cgl1->cgrp;
cg2 = cgl2->cgrp;
/* Hierarchies should be linked in the same order. */
- BUG_ON(cg1->root != cg2->root);
+ BUG_ON(cg1->root->top_root != cg2->root->top_root);
/*
* If this hierarchy is the hierarchy of the cgroup
@@ -528,7 +528,7 @@ static bool compare_css_sets(struct css_set *cg,
* hierarchy, then this css_set should point to the
* same cgroup as the old css_set.
*/
- if (cg1->root == new_cgrp->root) {
+ if (cg1->root->top_root == new_cgrp->root->top_root) {
if (cg1 != new_cgrp)
return false;
} else {
@@ -703,7 +703,7 @@ static struct css_set *find_css_set(
/* Add reference counts and links from the new css_set. */
list_for_each_entry(link, &oldcg->cg_links, cg_link_list) {
struct cgroup *c = link->cgrp;
- if (c->root == cgrp->root)
+ if (c->root->top_root == cgrp->root->top_root)
c = cgrp;
link_css_set(&tmp_cg_links, res, c);
}
@@ -745,7 +745,7 @@ static struct cgroup *task_cgroup_from_root(struct task_struct *task,
struct cg_cgroup_link *link;
list_for_each_entry(link, &css->cg_links, cg_link_list) {
struct cgroup *c = link->cgrp;
- if (c->root == root) {
+ if (c->root->top_root == root->top_root) {
res = c;
break;
}
@@ -2840,7 +2840,8 @@ static void cgroup_cfts_commit(struct cgroup_subsys *ss,
/* %NULL @cfts indicates abort and don't bother if @ss isn't attached */
if (cfts && ss->root != &rootnode) {
- list_for_each_entry(cgrp, &ss->root->allcg_list, allcg_node) {
+ list_for_each_entry(cgrp, &ss->root->top_root->allcg_list,
+ allcg_node) {
dget(cgrp->dentry);
list_add_tail(&cgrp->cft_q_node, &pending);
}
@@ -4207,7 +4208,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
/* allocation complete, commit to creation */
dentry->d_fsdata = cgrp;
cgrp->dentry = dentry;
- list_add_tail(&cgrp->allcg_node, &root->allcg_list);
+ list_add_tail(&cgrp->allcg_node, &root->top_root->allcg_list);
list_add_tail_rcu(&cgrp->sibling, &cgrp->parent->children);
root->number_of_cgroups++;
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 4/9] introduce helper function cgroup_in_root
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (2 preceding siblings ...)
2012-12-17 6:43 ` [RFC PATCH 3/9] cgroup: use root->top_root instead of root Gao feng
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 6:43 ` [RFC PATCH 5/9] cgroup: add container support for cgroup Gao feng
` (7 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
After adding container support for cgroup,there will be
many roots.
Add a helper function cgroup_in_root,only operate the
cgroup when it's in the root.
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2e14c8f..0195db1 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -850,6 +850,14 @@ static struct backing_dev_info cgroup_backing_dev_info = {
static int alloc_css_id(struct cgroup_subsys *ss,
struct cgroup *parent, struct cgroup *child);
+static inline bool cgroup_in_root(struct cgroup *cgrp,
+ struct cgroupfs_root *root)
+{
+ if (root == cgrp->root)
+ return true;
+ return false;
+}
+
/*
* Find out the top root by subsys_mask.
*/
@@ -2047,7 +2055,8 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
cgroup_lock();
for_each_active_root(root) {
struct cgroup *from_cg = task_cgroup_from_root(from, root);
-
+ if (!cgroup_in_root(from_cg, root))
+ continue;
retval = cgroup_attach_task(from_cg, tsk);
if (retval)
break;
@@ -4786,6 +4795,10 @@ static int proc_cgroup_show(struct seq_file *m, void *v)
struct cgroup *cgrp;
int count = 0;
+ cgrp = task_cgroup_from_root(tsk, root);
+ if (!cgroup_in_root(cgrp, root))
+ continue;
+
seq_printf(m, "%d:", root->hierarchy_id);
for_each_subsys(root, ss)
seq_printf(m, "%s%s", count++ ? "," : "", ss->name);
@@ -4793,7 +4806,7 @@ static int proc_cgroup_show(struct seq_file *m, void *v)
seq_printf(m, "%sname=%s", count ? "," : "",
root->name);
seq_putc(m, ':');
- cgrp = task_cgroup_from_root(tsk, root);
+
retval = cgroup_path(cgrp, buf, PAGE_SIZE);
if (retval < 0)
goto out_unlock;
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 5/9] cgroup: add container support for cgroup
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (3 preceding siblings ...)
2012-12-17 6:43 ` [RFC PATCH 4/9] introduce helper function cgroup_in_root Gao feng
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 6:43 ` [RFC PATCH 6/9] pidns: move next_tgid to kernel/pid.c Gao feng
` (6 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
with this patch, the cgroup mounted in the container will
have it's own cgroupfs_root.
The css of this hierarchy's top cgroup are same with
container's init task's css.
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 216 +++++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 162 insertions(+), 54 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0195db1..ac61027 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1024,21 +1024,13 @@ static void cgroup_d_remove_dir(struct dentry *dentry)
remove_dir(dentry);
}
-/*
- * Call with cgroup_mutex held. Drops reference counts on modules, including
- * any duplicate ones that parse_cgroupfs_options took. If this function
- * returns an error, no reference counts are touched.
- */
-static int rebind_subsystems(struct cgroupfs_root *root,
- unsigned long final_subsys_mask)
+static int __rebind_subsystems(struct cgroupfs_root *root,
+ unsigned long final_subsys_mask)
{
unsigned long added_mask, removed_mask;
struct cgroup *cgrp = &root->top_cgroup;
int i;
- BUG_ON(!mutex_is_locked(&cgroup_mutex));
- BUG_ON(!mutex_is_locked(&cgroup_root_mutex));
-
removed_mask = root->actual_subsys_mask & ~final_subsys_mask;
added_mask = final_subsys_mask & ~root->actual_subsys_mask;
/* Check that any added subsystems are currently free */
@@ -1059,13 +1051,6 @@ static int rebind_subsystems(struct cgroupfs_root *root,
}
}
- /* Currently we don't handle adding/removing subsystems when
- * any child cgroups exist. This is theoretically supportable
- * but involves complex error handling, so it's being left until
- * later */
- if (root->number_of_cgroups > 1)
- return -EBUSY;
-
/* Process each subsystem */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
@@ -1113,6 +1098,117 @@ static int rebind_subsystems(struct cgroupfs_root *root,
BUG_ON(cgrp->subsys[i]);
}
}
+
+ return 0;
+}
+
+static int __rebind_subsystems_ns(struct cgroupfs_root *root,
+ unsigned long final_subsys_mask)
+{
+ unsigned long added_mask, removed_mask;
+ struct cgroup *cgrp = &root->top_cgroup;
+ struct cgroup *parent = NULL;
+ struct cgroupfs_root *top_root = NULL;
+ unsigned long bit;
+ int i;
+
+ removed_mask = root->actual_subsys_mask & ~final_subsys_mask;
+ added_mask = final_subsys_mask & ~root->actual_subsys_mask;
+
+ /* Get new top root and new parent */
+ if (final_subsys_mask) {
+ top_root = find_top_root(final_subsys_mask);
+ if (top_root == NULL)
+ return -EINVAL;
+
+ parent = task_cgroup_from_root(root->pid_ns->child_reaper,
+ top_root);
+ BUG_ON(parent == NULL);
+ }
+
+ /* Process each subsystem */
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ struct cgroup_subsys *ss = subsys[i];
+ struct cgroup_subsys_state *css;
+ bit = 1UL << i;
+ if (bit & added_mask) {
+ BUG_ON(cgrp->subsys[i]);
+ BUG_ON(parent->subsys[ss->subsys_id] == NULL);
+
+ css = parent->subsys[ss->subsys_id];
+ if (!css_tryget(css))
+ goto out;
+ cgrp->subsys[ss->subsys_id] = css;
+
+ /* refcount was already taken, and we're keeping it */
+ } else if (bit & removed_mask) {
+ BUG_ON(cgrp->subsys[i] != cgrp->parent->subsys[i]);
+
+ css_put(cgrp->subsys[i]);
+ cgrp->subsys[i] = NULL;
+
+ /* subsystem is now free - drop reference on module */
+ module_put(ss->module);
+ } else if (bit & final_subsys_mask) {
+ /*
+ * a refcount was taken, but we already had one, so
+ * drop the extra reference.
+ */
+ module_put(ss->module);
+ }
+ }
+
+ root->top_root = top_root;
+ cgrp->parent = parent;
+
+ /* Link to new top_root or unlink when umounting */
+ if (top_root)
+ list_move_tail(&cgrp->allcg_node, &top_root->allcg_list);
+ else
+ list_del_init(&cgrp->allcg_node);
+
+ return 0;
+out:
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ bit = 1UL << i;
+ if ((bit & added_mask) && cgrp->subsys[i]) {
+ css_put(cgrp->subsys[i]);
+ cgrp->subsys[i] = NULL;
+ }
+ }
+ return -EINVAL;
+}
+
+
+/*
+ * Call with cgroup_mutex held. Drops reference counts on modules, including
+ * any duplicate ones that parse_cgroupfs_options took. If this function
+ * returns an error, no reference counts are touched.
+ */
+static int rebind_subsystems(struct cgroupfs_root *root,
+ unsigned long final_subsys_mask)
+{
+ int err = 0;
+
+ BUG_ON(!mutex_is_locked(&cgroup_mutex));
+ BUG_ON(!mutex_is_locked(&cgroup_root_mutex));
+
+ /* Currently we don't handle adding/removing subsystems when
+ * any child cgroups exist. This is theoretically supportable
+ * but involves complex error handling, so it's being left until
+ * later */
+ if (root->number_of_cgroups > 1)
+ return -EBUSY;
+
+ if (test_bit(ROOT_NAMESPACE, &root->flags))
+ err = __rebind_subsystems_ns(root, final_subsys_mask);
+ else
+ err = __rebind_subsystems(root, final_subsys_mask);
+
+ if (err)
+ return err;
+
+
root->subsys_mask = root->actual_subsys_mask = final_subsys_mask;
synchronize_rcu();
@@ -1490,6 +1586,10 @@ static int cgroup_test_super(struct super_block *sb, void *data)
&& (opts->subsys_mask != root->subsys_mask))
return 0;
+ /* Pid namespace must match too */
+ if (root->pid_ns != task_active_pid_ns(current))
+ return 0;
+
return 1;
}
@@ -1656,52 +1756,60 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
if (!strcmp(existing_root->name, root->name))
goto unlock_drop;
- /*
- * We're accessing css_set_count without locking
- * css_set_lock here, but that's OK - it can only be
- * increased by someone holding cgroup_lock, and
- * that's us. The worst that can happen is that we
- * have some link structures left over
- */
- ret = allocate_cg_links(css_set_count, &tmp_cg_links);
- if (ret)
- goto unlock_drop;
+ if (!test_bit(ROOT_NAMESPACE, &root->flags)) {
+ /*
+ * We're accessing css_set_count without locking
+ * css_set_lock here, but that's OK - it can only be
+ * increased by someone holding cgroup_lock, and
+ * that's us. The worst that can happen is that we
+ * have some link structures left over
+ */
+ ret = allocate_cg_links(css_set_count, &tmp_cg_links);
+ if (ret)
+ goto unlock_drop;
+
+ ret = rebind_subsystems(root, root->subsys_mask);
+ if (ret == -EBUSY) {
+ free_cg_links(&tmp_cg_links);
+ goto unlock_drop;
+ }
+ /*
+ * There must be no failure case after here, since
+ * rebinding takes care of subsystems' refcounts,
+ * which are explicitly dropped in the failure exit
+ * path.
+ */
+
+ /* EBUSY should be the only error here */
+ BUG_ON(ret);
+ top_root_count++;
+
+ /* Link the top cgroup in this hierarchy into all
+ * the css_set objects */
+ write_lock(&css_set_lock);
+ for (i = 0; i < CSS_SET_TABLE_SIZE; i++) {
+ struct hlist_head *hhead = &css_set_table[i];
+ struct hlist_node *node;
+ struct css_set *cg;
+
+ hlist_for_each_entry(cg, node, hhead, hlist)
+ link_css_set(&tmp_cg_links, cg,
+ root_cgrp);
+ }
+ write_unlock(&css_set_lock);
- ret = rebind_subsystems(root, root->subsys_mask);
- if (ret == -EBUSY) {
free_cg_links(&tmp_cg_links);
- goto unlock_drop;
+ } else {
+ ret = rebind_subsystems(root, root->subsys_mask);
+ if (ret)
+ goto unlock_drop;
}
- /*
- * There must be no failure case after here, since rebinding
- * takes care of subsystems' refcounts, which are explicitly
- * dropped in the failure exit path.
- */
-
- /* EBUSY should be the only error here */
- BUG_ON(ret);
list_add(&root->root_list, &roots);
- top_root_count++;
sb->s_root->d_fsdata = root_cgrp;
root->top_cgroup.dentry = sb->s_root;
- /* Link the top cgroup in this hierarchy into all
- * the css_set objects */
- write_lock(&css_set_lock);
- for (i = 0; i < CSS_SET_TABLE_SIZE; i++) {
- struct hlist_head *hhead = &css_set_table[i];
- struct hlist_node *node;
- struct css_set *cg;
-
- hlist_for_each_entry(cg, node, hhead, hlist)
- link_css_set(&tmp_cg_links, cg, root_cgrp);
- }
- write_unlock(&css_set_lock);
-
- free_cg_links(&tmp_cg_links);
-
BUG_ON(!list_empty(&root_cgrp->children));
BUG_ON(root->number_of_cgroups != 1);
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 6/9] pidns: move next_tgid to kernel/pid.c
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (4 preceding siblings ...)
2012-12-17 6:43 ` [RFC PATCH 5/9] cgroup: add container support for cgroup Gao feng
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 6:43 ` [RFC PATCH 7/9] cgroup: attach container's tasks to proper cgroup Gao feng
` (5 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
cgroup will use next_tgid to iterate tasks in pid namespace.
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
fs/proc/base.c | 43 -------------------------------------------
include/linux/sched.h | 8 ++++++++
kernel/pid.c | 39 +++++++++++++++++++++++++++++++++++++++
3 files changed, 47 insertions(+), 43 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 144a967..868c4ed 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2795,49 +2795,6 @@ out:
return result;
}
-/*
- * Find the first task with tgid >= tgid
- *
- */
-struct tgid_iter {
- unsigned int tgid;
- struct task_struct *task;
-};
-static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter)
-{
- struct pid *pid;
-
- if (iter.task)
- put_task_struct(iter.task);
- rcu_read_lock();
-retry:
- iter.task = NULL;
- pid = find_ge_pid(iter.tgid, ns);
- if (pid) {
- iter.tgid = pid_nr_ns(pid, ns);
- iter.task = pid_task(pid, PIDTYPE_PID);
- /* What we to know is if the pid we have find is the
- * pid of a thread_group_leader. Testing for task
- * being a thread_group_leader is the obvious thing
- * todo but there is a window when it fails, due to
- * the pid transfer logic in de_thread.
- *
- * So we perform the straight forward test of seeing
- * if the pid we have found is the pid of a thread
- * group leader, and don't worry if the task we have
- * found doesn't happen to be a thread group leader.
- * As we don't care in the case of readdir.
- */
- if (!iter.task || !has_group_leader_pid(iter.task)) {
- iter.tgid += 1;
- goto retry;
- }
- get_task_struct(iter.task);
- }
- rcu_read_unlock();
- return iter;
-}
-
#define TGID_OFFSET (FIRST_PROCESS_ENTRY + ARRAY_SIZE(proc_base_stuff))
static int proc_pid_fill_cache(struct file *filp, void *dirent, filldir_t filldir,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0dd42a0..9fde2ed 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2128,6 +2128,14 @@ extern struct task_struct *find_task_by_vpid(pid_t nr);
extern struct task_struct *find_task_by_pid_ns(pid_t nr,
struct pid_namespace *ns);
+struct tgid_iter {
+ unsigned int tgid;
+ struct task_struct *task;
+};
+
+extern struct tgid_iter next_tgid(struct pid_namespace *ns,
+ struct tgid_iter iter);
+
extern void __set_special_pids(struct pid *pid);
/* per-UID process charging. */
diff --git a/kernel/pid.c b/kernel/pid.c
index aebd4f5..7a1341d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -538,6 +538,45 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
}
/*
+ * Find the first task with tgid >= tgid
+ *
+ */
+struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter)
+{
+ struct pid *pid;
+
+ if (iter.task)
+ put_task_struct(iter.task);
+ rcu_read_lock();
+retry:
+ iter.task = NULL;
+ pid = find_ge_pid(iter.tgid, ns);
+ if (pid) {
+ iter.tgid = pid_nr_ns(pid, ns);
+ iter.task = pid_task(pid, PIDTYPE_PID);
+ /* What we to know is if the pid we have find is the
+ * pid of a thread_group_leader. Testing for task
+ * being a thread_group_leader is the obvious thing
+ * todo but there is a window when it fails, due to
+ * the pid transfer logic in de_thread.
+ *
+ * So we perform the straight forward test of seeing
+ * if the pid we have found is the pid of a thread
+ * group leader, and don't worry if the task we have
+ * found doesn't happen to be a thread group leader.
+ * As we don't care in the case of readdir.
+ */
+ if (!iter.task || !has_group_leader_pid(iter.task)) {
+ iter.tgid += 1;
+ goto retry;
+ }
+ get_task_struct(iter.task);
+ }
+ rcu_read_unlock();
+ return iter;
+}
+
+/*
* The pid hash table is scaled according to the amount of memory in the
* machine. From a minimum of 16 slots up to 4096 slots at one gigabyte or
* more.
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 7/9] cgroup: attach container's tasks to proper cgroup
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (5 preceding siblings ...)
2012-12-17 6:43 ` [RFC PATCH 6/9] pidns: move next_tgid to kernel/pid.c Gao feng
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 6:43 ` [RFC PATCH 8/9] cgroup: disallow container to change top cgroup's subsys files Gao feng
` (4 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
If cgroup is mounted in container,move container's
tasks to the top cgroup of container.
If cgroup is unmounted in container,move container's
tasks back to the host's cgroup.
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 42 insertions(+), 6 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ac61027..e077660 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1110,6 +1110,7 @@ static int __rebind_subsystems_ns(struct cgroupfs_root *root,
struct cgroup *parent = NULL;
struct cgroupfs_root *top_root = NULL;
unsigned long bit;
+ struct tgid_iter iter;
int i;
removed_mask = root->actual_subsys_mask & ~final_subsys_mask;
@@ -1126,6 +1127,25 @@ static int __rebind_subsystems_ns(struct cgroupfs_root *root,
BUG_ON(parent == NULL);
}
+ /*
+ * Attach container's tasks to host's cgroup,since this
+ * top cgroup may be destroyed or changed. If task attaching
+ * failed, return error immediately.
+ */
+ if (cgrp->parent) {
+ int error;
+ iter.task = NULL;
+ iter.tgid = 1;
+ for (iter = next_tgid(root->pid_ns, iter); iter.task;
+ iter.tgid += 1, iter = next_tgid(root->pid_ns, iter)) {
+ error = cgroup_attach_task(cgrp->parent, iter.task);
+ if (error && error != -ESRCH) {
+ put_task_struct(iter.task);
+ return error;
+ }
+ }
+ }
+
/* Process each subsystem */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
@@ -1158,15 +1178,31 @@ static int __rebind_subsystems_ns(struct cgroupfs_root *root,
}
}
- root->top_root = top_root;
- cgrp->parent = parent;
-
/* Link to new top_root or unlink when umounting */
- if (top_root)
+ if (top_root) {
+ root->top_root = top_root;
+ cgrp->parent = parent;
+ /*
+ * Attach container's tasks to the new top cgroup.
+ * We don't care whether some tasks are not attached
+ * to this cgroup successfully, it totally does no harm.
+ */
+ iter.task = NULL;
+ iter.tgid = 1;
+ for (iter = next_tgid(root->pid_ns, iter); iter.task;
+ iter.tgid += 1, iter = next_tgid(root->pid_ns, iter)) {
+ cgroup_attach_task(cgrp, iter.task);
+ }
+ atomic_inc(&top_root->sb->s_active);
+ top_root->number_of_cgroups++;
list_move_tail(&cgrp->allcg_node, &top_root->allcg_list);
- else
+ } else {
+ root->top_root->number_of_cgroups--;
+ atomic_dec(&root->top_root->sb->s_active);
+ root->top_root = NULL;
+ cgrp->parent = NULL;
list_del_init(&cgrp->allcg_node);
-
+ }
return 0;
out:
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 8/9] cgroup: disallow container to change top cgroup's subsys files
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (6 preceding siblings ...)
2012-12-17 6:43 ` [RFC PATCH 7/9] cgroup: attach container's tasks to proper cgroup Gao feng
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 6:43 ` [RFC PATCH 9/9] cgroup: rework cgroup_path Gao feng
` (3 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
this patch disallows container to change top cgroup's
subsystem files,since these files are shared with host.
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e077660..b0caa1d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2928,6 +2928,14 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
}
mode = cgroup_file_mode(cft);
+ /*
+ * Disallow container to change it's top cgroup's subsys files,
+ * since these files are shared with host.
+ */
+ if (test_bit(ROOT_NAMESPACE, &cgrp->root->flags) &&
+ cgrp == cgrp->top_cgroup)
+ mode &= ~S_IWUSR;
+
error = cgroup_create_file(dentry, mode | S_IFREG, cgrp->root->sb);
if (!error) {
cfe->type = (void *)cft;
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 9/9] cgroup: rework cgroup_path
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (7 preceding siblings ...)
2012-12-17 6:43 ` [RFC PATCH 8/9] cgroup: disallow container to change top cgroup's subsys files Gao feng
@ 2012-12-17 6:43 ` Gao feng
2012-12-17 8:08 ` [RFC PATCH 0/9] Add container support for cgroup Glauber Costa
` (2 subsequent siblings)
11 siblings, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 6:43 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ, Gao feng
use cgrp == cgrp->top_cgroup instead of cgrp == NULL
Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
kernel/cgroup.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b0caa1d..c111cf9 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1976,12 +1976,14 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
if ((start -= len) < buf)
return -ENAMETOOLONG;
memcpy(start, dentry->d_name.name, len);
- cgrp = cgrp->parent;
- if (!cgrp)
+
+ if (cgrp == cgrp->top_cgroup)
break;
+ cgrp = cgrp->parent;
dentry = cgrp->dentry;
- if (!cgrp->parent)
+
+ if (cgrp == cgrp->top_cgroup)
continue;
if (--start < buf)
return -ENAMETOOLONG;
--
1.7.7.6
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] Add container support for cgroup
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (8 preceding siblings ...)
2012-12-17 6:43 ` [RFC PATCH 9/9] cgroup: rework cgroup_path Gao feng
@ 2012-12-17 8:08 ` Glauber Costa
[not found] ` <50CED2FD.1040509-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-17 13:16 ` Serge Hallyn
2012-12-17 23:48 ` Tejun Heo
11 siblings, 1 reply; 18+ messages in thread
From: Glauber Costa @ 2012-12-17 8:08 UTC (permalink / raw)
To: Gao feng
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw
On 12/17/2012 10:43 AM, Gao feng wrote:
> Right now,if we mount cgroup in the container,we will get
> host's cgroup informations and even we can change host's
> cgroup in container.
>
> So the resource controller of the container will lose
> effectiveness.
>
> This patchset try to add contianer support for cgroup.
> the main idea is allocateing cgroup super-block for each
> cgroup mounted in different pid namespace.
>
> The top cgroup of container will share css with host.
> When the cgroup being mounted in contianer,the tasks in
> this container will be attached to this new mounted
> hierarchy's top cgroup, And when unmounting cgroup in
> container,these tasks will be attached back to host's cgroup.
>
> Since the container can change the shared css through it's
> cgroup subsystem files. patch 7/8 disable the write permission
> of container's top cgroup files. In my TODO list, container
> will have it's own css, this problem will disappear.
>
>
> This patchset is sent as RFC,any comments are welcome.
> Maybe this isn't the best solution, if you have better
> solution,Please let me know.
Question 1:
Any particular reason to have picked the pid namespace?
Maybe it is the right thing, since we are basically dealing with
grouping of tasks. OTOH, what you are doing sounds very much like
a private mount, indicating that the mount namespace should be used.
This needs to be well justified.
Also, "container support" can really mean a lot of things. I am still
trying, while reading your patches, to figure out what exactly do you
want to achieve. What it seems so far is that you want an unprivileged
process living inside a namespace to manipulate the cgroup hierarchy and
have its own copy of the cgroup tree, laid as it pleases. You also want
to be able to write PIDs as seen by the containing namespace, and to
have it somehow translated. Am I right?
For future submissions, could you make this clearer?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] Add container support for cgroup
[not found] ` <50CED2FD.1040509-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-12-17 8:54 ` Gao feng
2012-12-19 21:39 ` Serge Hallyn
1 sibling, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-17 8:54 UTC (permalink / raw)
To: Glauber Costa
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw
Hi Glauber
On 2012/12/17 16:08, Glauber Costa wrote:
> On 12/17/2012 10:43 AM, Gao feng wrote:
>> Right now,if we mount cgroup in the container,we will get
>> host's cgroup informations and even we can change host's
>> cgroup in container.
>>
>> So the resource controller of the container will lose
>> effectiveness.
>>
>> This patchset try to add contianer support for cgroup.
>> the main idea is allocateing cgroup super-block for each
>> cgroup mounted in different pid namespace.
>>
>> The top cgroup of container will share css with host.
>> When the cgroup being mounted in contianer,the tasks in
>> this container will be attached to this new mounted
>> hierarchy's top cgroup, And when unmounting cgroup in
>> container,these tasks will be attached back to host's cgroup.
>>
>> Since the container can change the shared css through it's
>> cgroup subsystem files. patch 7/8 disable the write permission
>> of container's top cgroup files. In my TODO list, container
>> will have it's own css, this problem will disappear.
>>
>>
>> This patchset is sent as RFC,any comments are welcome.
>> Maybe this isn't the best solution, if you have better
>> solution,Please let me know.
>
>
> Question 1:
>
> Any particular reason to have picked the pid namespace?
>
> Maybe it is the right thing, since we are basically dealing with
> grouping of tasks. OTOH, what you are doing sounds very much like
> a private mount, indicating that the mount namespace should be used.
> This needs to be well justified.
Consider this situation, a container only has mnt ns unshared with host.
then we mount cgroup in this container. we don't know what we can do with
this new cgroup.we don't know which tasks should be attached to this cgroup,
unless we find out all tasks in this mnt ns.
I want to make things clear,all tasks in the pid ns use the cgroup which
mounted in this pid ns.
>
> Also, "container support" can really mean a lot of things. I am still
> trying, while reading your patches, to figure out what exactly do you
> want to achieve. What it seems so far is that you want an unprivileged
> process living inside a namespace to manipulate the cgroup hierarchy and
> have its own copy of the cgroup tree, laid as it pleases. You also want
> to be able to write PIDs as seen by the containing namespace, and to
> have it somehow translated. Am I right?
>
Yes, you are right :)
"container support" is confused,I will change it to "pidns support".
> For future submissions, could you make this clearer?
>
Will do , thanks for you comment!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] Add container support for cgroup
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (9 preceding siblings ...)
2012-12-17 8:08 ` [RFC PATCH 0/9] Add container support for cgroup Glauber Costa
@ 2012-12-17 13:16 ` Serge Hallyn
2012-12-17 23:48 ` Tejun Heo
11 siblings, 0 replies; 18+ messages in thread
From: Serge Hallyn @ 2012-12-17 13:16 UTC (permalink / raw)
To: Gao feng
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w, glommer-bzQdu9zFT3WakBO8gow8eQ
Quoting Gao feng (gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org):
> Right now,if we mount cgroup in the container,we will get
> host's cgroup informations and even we can change host's
> cgroup in container.
>
> So the resource controller of the container will lose
> effectiveness.
>
> This patchset try to add contianer support for cgroup.
> the main idea is allocateing cgroup super-block for each
> cgroup mounted in different pid namespace.
>
> The top cgroup of container will share css with host.
> When the cgroup being mounted in contianer,the tasks in
> this container will be attached to this new mounted
> hierarchy's top cgroup, And when unmounting cgroup in
> container,these tasks will be attached back to host's cgroup.
>
> Since the container can change the shared css through it's
> cgroup subsystem files. patch 7/8 disable the write permission
> of container's top cgroup files. In my TODO list, container
> will have it's own css, this problem will disappear.
>
>
> This patchset is sent as RFC,any comments are welcome.
> Maybe this isn't the best solution, if you have better
> solution,Please let me know.
Sounds very interesting, thanks. I'm out (and mostly AFK) but
will take a look on wed or thu.
-serge
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] Add container support for cgroup
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
` (10 preceding siblings ...)
2012-12-17 13:16 ` Serge Hallyn
@ 2012-12-17 23:48 ` Tejun Heo
[not found] ` <20121217234816.GA10220-9pTldWuhBndy/B6EtB590w@public.gmane.org>
11 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2012-12-17 23:48 UTC (permalink / raw)
To: Gao feng
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w, cgroups-u79uwXL29TY76Z2rM5mHXA
Hello,
On Mon, Dec 17, 2012 at 02:43:26PM +0800, Gao feng wrote:
> Right now,if we mount cgroup in the container,we will get
> host's cgroup informations and even we can change host's
> cgroup in container.
>
> So the resource controller of the container will lose
> effectiveness.
>
> This patchset try to add contianer support for cgroup.
> the main idea is allocateing cgroup super-block for each
> cgroup mounted in different pid namespace.
>
> The top cgroup of container will share css with host.
> When the cgroup being mounted in contianer,the tasks in
> this container will be attached to this new mounted
> hierarchy's top cgroup, And when unmounting cgroup in
> container,these tasks will be attached back to host's cgroup.
>
> Since the container can change the shared css through it's
> cgroup subsystem files. patch 7/8 disable the write permission
> of container's top cgroup files. In my TODO list, container
> will have it's own css, this problem will disappear.
>
> This patchset is sent as RFC,any comments are welcome.
> Maybe this isn't the best solution, if you have better
> solution,Please let me know.
So, I'm *highly* unlikely to accept any patches which try to add
namespace support directly to cgroup in any form unless someone can
definitively show me this can't be done using FUSE or other userland
solutions.
cgroupfs is going to be an interface to expose the resource control
facilities of the kernel and that's the extent the interface will be
capable of. It in itself won't support delegation of resource
policies to names spaces or unprivieged users.
Although I don't have anything concrete yet, the tentative plan is to
have something in userland which can integrate with the base system so
that userland has an unified and controlled way to interact with
cgroup which can be easily integrated with the rest of the base system
and kernel has at least some level of interface isolation. Basically,
something like libudev or libsysfs.
So, if people want to allow NSes control its subtree of cgroups, I
really want something in the userland which sits between the NSes and
actual cgroup, and I bet things would actually be much better that
way. cgroupfs seems to allow it but you can't really delegate
management of subtree easily. Controllers would collapse with
increasing level of nesting, root cgroups have different knobs or
different interpretations of the same knobs, and siblings interact
with each other, and I don't think making cgroupfs interface generic
enough so that it can be used for all that is desirable or a
worthwhile effort.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] Add container support for cgroup
[not found] ` <20121217234816.GA10220-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2012-12-17 23:54 ` Eric W. Biederman
[not found] ` <87obhsgrq7.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-18 5:37 ` Gao feng
1 sibling, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2012-12-17 23:54 UTC (permalink / raw)
To: Tejun Heo
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> writes:
> So, I'm *highly* unlikely to accept any patches which try to add
> namespace support directly to cgroup in any form unless someone can
> definitively show me this can't be done using FUSE or other userland
> solutions.
FUSE doesn't work in user namespaces. Nor have I seen any interest by
FUSE developers to even comment on patches doing anything of the sort.
So at the moment FUSE is not useful as even a rhetorical point. FUSE is
DOA.
Eric
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] Add container support for cgroup
[not found] ` <87obhsgrq7.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-17 23:56 ` Tejun Heo
0 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2012-12-17 23:56 UTC (permalink / raw)
To: Eric W. Biederman
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA
Hello, Eric.
On Mon, Dec 17, 2012 at 03:54:08PM -0800, Eric W. Biederman wrote:
> Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> writes:
>
> > So, I'm *highly* unlikely to accept any patches which try to add
> > namespace support directly to cgroup in any form unless someone can
> > definitively show me this can't be done using FUSE or other userland
> > solutions.
>
> FUSE doesn't work in user namespaces. Nor have I seen any interest by
> FUSE developers to even comment on patches doing anything of the sort.
Why doesn't it? Is there any fundamental technical reason it can't
work or is it just because nobody is working on it?
> So at the moment FUSE is not useful as even a rhetorical point. FUSE is
> DOA.
I don't really care how it gets done but the barrier to adding NS
support to cgroupfs proper is gonna be very high.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] Add container support for cgroup
[not found] ` <20121217234816.GA10220-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2012-12-17 23:54 ` Eric W. Biederman
@ 2012-12-18 5:37 ` Gao feng
1 sibling, 0 replies; 18+ messages in thread
From: Gao feng @ 2012-12-18 5:37 UTC (permalink / raw)
To: Tejun Heo
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
lizefan-hv44wF8Li93QT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
glommer-bzQdu9zFT3WakBO8gow8eQ
Hello Tejun
On 2012/12/18 07:48, Tejun Heo wrote:
> Hello,
>
> On Mon, Dec 17, 2012 at 02:43:26PM +0800, Gao feng wrote:
>> Right now,if we mount cgroup in the container,we will get
>> host's cgroup informations and even we can change host's
>> cgroup in container.
>>
>> So the resource controller of the container will lose
>> effectiveness.
>>
>> This patchset try to add contianer support for cgroup.
>> the main idea is allocateing cgroup super-block for each
>> cgroup mounted in different pid namespace.
>>
>> The top cgroup of container will share css with host.
>> When the cgroup being mounted in contianer,the tasks in
>> this container will be attached to this new mounted
>> hierarchy's top cgroup, And when unmounting cgroup in
>> container,these tasks will be attached back to host's cgroup.
>>
>> Since the container can change the shared css through it's
>> cgroup subsystem files. patch 7/8 disable the write permission
>> of container's top cgroup files. In my TODO list, container
>> will have it's own css, this problem will disappear.
>>
>> This patchset is sent as RFC,any comments are welcome.
>> Maybe this isn't the best solution, if you have better
>> solution,Please let me know.
>
> So, I'm *highly* unlikely to accept any patches which try to add
> namespace support directly to cgroup in any form unless someone can
> definitively show me this can't be done using FUSE or other userland
> solutions.
>
> cgroupfs is going to be an interface to expose the resource control
> facilities of the kernel and that's the extent the interface will be
> capable of. It in itself won't support delegation of resource
> policies to names spaces or unprivieged users.
>
> Although I don't have anything concrete yet, the tentative plan is to
> have something in userland which can integrate with the base system so
> that userland has an unified and controlled way to interact with
> cgroup which can be easily integrated with the rest of the base system
> and kernel has at least some level of interface isolation. Basically,
> something like libudev or libsysfs.
As your advice,the container's interface will be changed in container.
Container will not be able to enable cgroup by the command:
"mount -t cgroup -o xxx xxx /path".
Though something like libudev or libsysfs can afford the interface for
container to get and set cgroups.But the interface provided by cgroupfs
will lose effectiveness in container.
>
> So, if people want to allow NSes control its subtree of cgroups, I
> really want something in the userland which sits between the NSes and
> actual cgroup, and I bet things would actually be much better that
> way. cgroupfs seems to allow it but you can't really delegate
> management of subtree easily. Controllers would collapse with
> increasing level of nesting, root cgroups have different knobs or
> different interpretations of the same knobs, and siblings interact
> with each other, and I don't think making cgroupfs interface generic
> enough so that it can be used for all that is desirable or a
> worthwhile effort.
>
I recognize it's not easy,BUT I just want the usage of the OS which
running in container as same as the host.
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] Add container support for cgroup
[not found] ` <50CED2FD.1040509-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-17 8:54 ` Gao feng
@ 2012-12-19 21:39 ` Serge Hallyn
1 sibling, 0 replies; 18+ messages in thread
From: Serge Hallyn @ 2012-12-19 21:39 UTC (permalink / raw)
To: Glauber Costa
Cc: Gao feng, cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
ebiederm-aS9lmoZGLiVWk0Htik3J/w
Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> On 12/17/2012 10:43 AM, Gao feng wrote:
> > Right now,if we mount cgroup in the container,we will get
> > host's cgroup informations and even we can change host's
> > cgroup in container.
> >
> > So the resource controller of the container will lose
> > effectiveness.
> >
> > This patchset try to add contianer support for cgroup.
> > the main idea is allocateing cgroup super-block for each
> > cgroup mounted in different pid namespace.
> >
> > The top cgroup of container will share css with host.
> > When the cgroup being mounted in contianer,the tasks in
> > this container will be attached to this new mounted
> > hierarchy's top cgroup, And when unmounting cgroup in
> > container,these tasks will be attached back to host's cgroup.
> >
> > Since the container can change the shared css through it's
> > cgroup subsystem files. patch 7/8 disable the write permission
> > of container's top cgroup files. In my TODO list, container
> > will have it's own css, this problem will disappear.
> >
> >
> > This patchset is sent as RFC,any comments are welcome.
> > Maybe this isn't the best solution, if you have better
> > solution,Please let me know.
>
>
> Question 1:
>
> Any particular reason to have picked the pid namespace?
>
> Maybe it is the right thing, since we are basically dealing with
> grouping of tasks.
Yes, but pid namespace is more about naming of tasks than grouping
of tasks (ignoring the reaper). And the cgroup task files properly
translate pids. I don't think this is good justification.
> OTOH, what you are doing sounds very much like
> a private mount, indicating that the mount namespace should be used.
> This needs to be well justified.
Agreed - though I prefer to avoid an existing ns at all.
> Also, "container support" can really mean a lot of things. I am still
> trying, while reading your patches, to figure out what exactly do you
> want to achieve. What it seems so far is that you want an unprivileged
> process living inside a namespace to manipulate the cgroup hierarchy and
> have its own copy of the cgroup tree, laid as it pleases. You also want
> to be able to write PIDs as seen by the containing namespace, and to
> have it somehow translated. Am I right?
>
> For future submissions, could you make this clearer?
IMO, what we want is for a task to be able to say "from now on,
make my current cgroups the cgroup roots for myself and any newly
spawned children". After that, the directory mounted using 'mount
-t cgroup' and output of /proc/self/cgroup should reflect the new
cgroups. Access to existing mounts should not be affected - leave
that to the user-namespace-enhanced DAC checks and to proper container
setup (i.e. unmounting old cgroup mounts), and trust good cgroup
hierarchies to do the rest.
The current RFC makes clone(CLONE_NEWPID) the way to say "make my
current cgroup the cgroup root." I think it would be simpler and
cleaner to use a new mount option, i.e. 'mount -t cgroup -o newroot'
to say 'make my current cgroup the cgroup root for myself and all
my new children." The task->nsproxy could be enhanced with a pointer
to the new cgroupfs superblock (since I'm taking away the pidns as a
hint for finding the right cgroup root).
BTW I'm not sure what the current plan for allowed subsys compositions
is, but depending on that we may need to watch out for the container
being able DOS the host by making a bad composition.
-serge
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-12-19 21:39 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-17 6:43 [RFC PATCH 0/9] Add container support for cgroup Gao feng
[not found] ` <1355726615-15224-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2012-12-17 6:43 ` [RFC PATCH 1/9] cgroup: introduce cgroupfs_root flag ROOT_NAMESPACE Gao feng
2012-12-17 6:43 ` [RFC PATCH 2/9] cgroup: introduce the top root Gao feng
2012-12-17 6:43 ` [RFC PATCH 3/9] cgroup: use root->top_root instead of root Gao feng
2012-12-17 6:43 ` [RFC PATCH 4/9] introduce helper function cgroup_in_root Gao feng
2012-12-17 6:43 ` [RFC PATCH 5/9] cgroup: add container support for cgroup Gao feng
2012-12-17 6:43 ` [RFC PATCH 6/9] pidns: move next_tgid to kernel/pid.c Gao feng
2012-12-17 6:43 ` [RFC PATCH 7/9] cgroup: attach container's tasks to proper cgroup Gao feng
2012-12-17 6:43 ` [RFC PATCH 8/9] cgroup: disallow container to change top cgroup's subsys files Gao feng
2012-12-17 6:43 ` [RFC PATCH 9/9] cgroup: rework cgroup_path Gao feng
2012-12-17 8:08 ` [RFC PATCH 0/9] Add container support for cgroup Glauber Costa
[not found] ` <50CED2FD.1040509-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-17 8:54 ` Gao feng
2012-12-19 21:39 ` Serge Hallyn
2012-12-17 13:16 ` Serge Hallyn
2012-12-17 23:48 ` Tejun Heo
[not found] ` <20121217234816.GA10220-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2012-12-17 23:54 ` Eric W. Biederman
[not found] ` <87obhsgrq7.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-17 23:56 ` Tejun Heo
2012-12-18 5:37 ` Gao feng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).