* [PATCH 0/3][V2] remove the ns_cgroup
@ 2010-09-27 10:14 Daniel Lezcano
[not found] ` <1285582453-6127-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Daniel Lezcano @ 2010-09-27 10:14 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
The ns_cgroup is a control group interacting with the namespaces.
When a new namespace is created, a corresponding cgroup is
automatically created too. The cgroup name is the pid of the process
who did 'unshare' or the child of 'clone'.
This cgroup is tied with the namespace because it prevents a
process to escape the control group and use the post_clone callback,
so the child cgroup inherits the values of the parent cgroup.
Unfortunately, the more we use this cgroup and the more we are facing
problems with it:
(1) when a process unshares, the cgroup name may conflict with a previous
cgroup with the same pid, so unshare or clone return -EEXIST
(2) the cgroup creation is out of control because there may have an
application creating several namespaces where the system will automatically
create several cgroups in his back and let them on the cgroupfs (eg. a vrf
based on the network namespace).
(3) the mix of (1) and (2) force an administrator to regularly check and
clean these cgroups.
This patchset removes the ns_cgroup by adding a new flag to the cgroup
and the cgroupfs mount option. It enables the copy of the parent cgroup
when a child cgroup is created. We can then safely remove the ns_cgroup as
this flag brings a compatibility. We have now to manually create and add the
task to a cgroup, which is consistent with the cgroup framework.
Changelog:
=========
* V2
Changed the following as Paul Menage suggested:
* removed the clone_children flag from the cgroupfs_root
* used the 'top_cgroup' to check if the 'clone_children' or not
in the mount option
* improved the description of the patch 2/3
* removed CONFIG_CGROUP_NS against new default configs
* V1
initial post
Daniel Lezcano (3):
cgroup : add clone_children control file
cgroup : make the mount options parsing more accurate
cgroup : remove the ns_cgroup
Documentation/cgroups/cgroups.txt | 16 ++-
arch/arm/configs/tegra_defconfig | 1 -
arch/mips/configs/bcm47xx_defconfig | 1 -
arch/powerpc/configs/ppc6xx_defconfig | 1 -
arch/powerpc/configs/pseries_defconfig | 1 -
arch/s390/defconfig | 1 -
arch/sh/configs/sdk7786_defconfig | 1 -
arch/sh/configs/se7206_defconfig | 1 -
arch/sh/configs/shx3_defconfig | 1 -
arch/sh/configs/urquell_defconfig | 1 -
arch/x86/configs/i386_defconfig | 1 -
arch/x86/configs/x86_64_defconfig | 1 -
include/linux/cgroup.h | 7 +-
include/linux/cgroup_subsys.h | 6 -
include/linux/nsproxy.h | 9 --
init/Kconfig | 9 --
kernel/Makefile | 1 -
kernel/cgroup.c | 243 +++++++++++++-------------------
kernel/cpuset.c | 7 +-
kernel/fork.c | 6 -
kernel/ns_cgroup.c | 110 --------------
kernel/nsproxy.c | 4 -
22 files changed, 118 insertions(+), 311 deletions(-)
delete mode 100644 kernel/ns_cgroup.c
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/3][V2] cgroup : add clone_children control file
[not found] ` <1285582453-6127-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
@ 2010-09-27 10:14 ` Daniel Lezcano
[not found] ` <1285582453-6127-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2010-09-27 10:14 ` [PATCH 2/3][V2] cgroup : make the mount options parsing more accurate Daniel Lezcano
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Daniel Lezcano @ 2010-09-27 10:14 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Paul Menage, Eric W. Biederman
This patch is sent as an answer to a previous thread around the ns_cgroup.
https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html
It adds a control file 'clone_children' for a cgroup.
This control file is a boolean specifying if the child cgroup should
be a clone of the parent cgroup or not. The default value is 'false'.
This flag makes the child cgroup to call the post_clone callback of all
the subsystem, if it is available.
At present, the cpuset is the only one which had implemented the post_clone
callback.
The option can be set at mount time by specifying the 'clone_children' mount
option.
Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Cc: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
Documentation/cgroups/cgroups.txt | 14 +++++++++++-
include/linux/cgroup.h | 4 +++
kernel/cgroup.c | 39 +++++++++++++++++++++++++++++++++++++
3 files changed, 55 insertions(+), 2 deletions(-)
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index b34823f..190018b 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -18,7 +18,8 @@ CONTENTS:
1.2 Why are cgroups needed ?
1.3 How are cgroups implemented ?
1.4 What does notify_on_release do ?
- 1.5 How do I use cgroups ?
+ 1.5 What does clone_children do ?
+ 1.6 How do I use cgroups ?
2. Usage Examples and Syntax
2.1 Basic Usage
2.2 Attaching processes
@@ -293,7 +294,16 @@ notify_on_release in the root cgroup at system boot is disabled
value of their parents notify_on_release setting. The default value of
a cgroup hierarchy's release_agent path is empty.
-1.5 How do I use cgroups ?
+1.5 What does clone_children do ?
+---------------------------------
+
+If the clone_children flag is enabled (1) in a cgroup, then all
+cgroups created beneath will call the post_clone callbacks for each
+subsystem of the newly created cgroup. Usually when this callback is
+implemented for a subsystem, it copies the values of the parent
+subsystem, this is the case for the cpuset.
+
+1.6 How do I use cgroups ?
--------------------------
To start a new job that is to be contained within a cgroup, using
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 709dfb9..ed4ba11 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -154,6 +154,10 @@ enum {
* A thread in rmdir() is wating for this cgroup.
*/
CGRP_WAIT_ON_RMDIR,
+ /*
+ * Clone cgroup values when creating a new child cgroup
+ */
+ CGRP_CLONE_CHILDREN,
};
/* which pidlist file are we talking about? */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7b69b8d..7b17c3e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -243,6 +243,11 @@ static int notify_on_release(const struct cgroup *cgrp)
return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
}
+static int clone_children(const struct cgroup *cgrp)
+{
+ return test_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
+}
+
/*
* for_each_subsys() allows you to iterate on each subsystem attached to
* an active hierarchy
@@ -1039,6 +1044,8 @@ static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
seq_puts(seq, ",noprefix");
if (strlen(root->release_agent_path))
seq_printf(seq, ",release_agent=%s", root->release_agent_path);
+ if (clone_children(&root->top_cgroup))
+ seq_puts(seq, ",clone_children");
if (strlen(root->name))
seq_printf(seq, ",name=%s", root->name);
mutex_unlock(&cgroup_mutex);
@@ -1049,6 +1056,7 @@ struct cgroup_sb_opts {
unsigned long subsys_bits;
unsigned long flags;
char *release_agent;
+ bool clone_children;
char *name;
/* User explicitly requested empty subsystem */
bool none;
@@ -1096,6 +1104,8 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
opts->none = true;
} else if (!strcmp(token, "noprefix")) {
set_bit(ROOT_NOPREFIX, &opts->flags);
+ } else if (!strcmp(token, "clone_children")) {
+ opts->clone_children = true;
} else if (!strncmp(token, "release_agent=", 14)) {
/* Specifying two release agents is forbidden */
if (opts->release_agent)
@@ -1354,6 +1364,8 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts)
strcpy(root->release_agent_path, opts->release_agent);
if (opts->name)
strcpy(root->name, opts->name);
+ if (opts->clone_children)
+ set_bit(CGRP_CLONE_CHILDREN, &root->top_cgroup.flags);
return root;
}
@@ -3172,6 +3184,23 @@ fail:
return ret;
}
+static u64 cgroup_clone_children_read(struct cgroup *cgrp,
+ struct cftype *cft)
+{
+ return clone_children(cgrp);
+}
+
+static int cgroup_clone_children_write(struct cgroup *cgrp,
+ struct cftype *cft,
+ u64 val)
+{
+ if (val)
+ set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
+ else
+ clear_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
+ return 0;
+}
+
/*
* for the common functions, 'private' gives the type of file
*/
@@ -3202,6 +3231,11 @@ static struct cftype files[] = {
.write_string = cgroup_write_event_control,
.mode = S_IWUGO,
},
+ {
+ .name = "cgroup.clone_children",
+ .read_u64 = cgroup_clone_children_read,
+ .write_u64 = cgroup_clone_children_write,
+ },
};
static struct cftype cft_release_agent = {
@@ -3331,6 +3365,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
if (notify_on_release(parent))
set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
+ if (clone_children(parent))
+ set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
+
for_each_subsys(root, ss) {
struct cgroup_subsys_state *css = ss->create(ss, cgrp);
@@ -3345,6 +3382,8 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
goto err_destroy;
}
/* At error, ->destroy() callback has to free assigned ID. */
+ if (clone_children(parent) && ss->post_clone)
+ ss->post_clone(ss, cgrp);
}
cgroup_lock_hierarchy(root);
--
1.7.0.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/3][V2] cgroup : make the mount options parsing more accurate
[not found] ` <1285582453-6127-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2010-09-27 10:14 ` [PATCH 1/3][V2] cgroup : add clone_children control file Daniel Lezcano
@ 2010-09-27 10:14 ` Daniel Lezcano
2010-09-27 10:14 ` [PATCH 3/3][V2] cgroup : remove the ns_cgroup Daniel Lezcano
2010-09-27 19:57 ` [PATCH 0/3][V2] " Andrew Morton
3 siblings, 0 replies; 17+ messages in thread
From: Daniel Lezcano @ 2010-09-27 10:14 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Current behavior:
=================
(1) When we mount a cgroup, we can specify the 'all' option which means
to enable all the cgroup subsystems. This is the default option when
no option is specified.
(2) If we want to mount a cgroup with a subset of the supported cgroup
subsystems, we have to specify a subsystems name list for the mount
option.
(3) If we specify another option like 'noprefix' or 'release_agent', the
actual code wants the 'all' or a subsystem name option specified also.
Not critical but a bit not friendly as we should assume (1) in this case.
(4) Logically, the 'all' option is mutually exclusive with a subsystem
name, but this is not detected.
In other words:
succeed : mount -t cgroup -o all,freezer cgroup /cgroup
=> is it 'all' or 'freezer' ?
fails : mount -t cgroup -o noprefix cgroup /cgroup
=> succeed if we do '-o noprefix,all'
The following patches consolidate a bit the mount options check.
New behavior:
=============
(1) untouched
(2) untouched
(3) the 'all' option will be by default when specifying other than
a subsystem name option
(4) raises an error
In other words:
fails : mount -t cgroup -o all,freezer cgroup /cgroup
succeed : mount -t cgroup -o noprefix cgroup /cgroup
For the sake of lisibility, the if ... then ... else ... if ...
indentation when parsing the options has been changed to:
if ... then
...
continue
fi
Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Reviewed-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Reviewed-by: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
kernel/cgroup.c | 90 ++++++++++++++++++++++++++++++++++++------------------
1 files changed, 60 insertions(+), 30 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7b17c3e..9eace43 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1073,7 +1073,8 @@ struct cgroup_sb_opts {
*/
static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
{
- char *token, *o = data ?: "all";
+ char *token, *o = data;
+ bool all_ss = false, one_ss = false;
unsigned long mask = (unsigned long)-1;
int i;
bool module_pin_failed = false;
@@ -1089,24 +1090,27 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
while ((token = strsep(&o, ",")) != NULL) {
if (!*token)
return -EINVAL;
- if (!strcmp(token, "all")) {
- /* Add all non-disabled subsystems */
- opts->subsys_bits = 0;
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
- struct cgroup_subsys *ss = subsys[i];
- if (ss == NULL)
- continue;
- if (!ss->disabled)
- opts->subsys_bits |= 1ul << i;
- }
- } else if (!strcmp(token, "none")) {
+ if (!strcmp(token, "none")) {
/* Explicitly have no subsystems */
opts->none = true;
- } else if (!strcmp(token, "noprefix")) {
+ continue;
+ }
+ if (!strcmp(token, "all")) {
+ /* Mutually exclusive option 'all' + subsystem name */
+ if (one_ss)
+ return -EINVAL;
+ all_ss = true;
+ continue;
+ }
+ if (!strcmp(token, "noprefix")) {
set_bit(ROOT_NOPREFIX, &opts->flags);
- } else if (!strcmp(token, "clone_children")) {
+ continue;
+ }
+ if (!strcmp(token, "clone_children")) {
opts->clone_children = true;
- } else if (!strncmp(token, "release_agent=", 14)) {
+ continue;
+ }
+ if (!strncmp(token, "release_agent=", 14)) {
/* Specifying two release agents is forbidden */
if (opts->release_agent)
return -EINVAL;
@@ -1114,7 +1118,9 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
kstrndup(token + 14, PATH_MAX - 1, GFP_KERNEL);
if (!opts->release_agent)
return -ENOMEM;
- } else if (!strncmp(token, "name=", 5)) {
+ continue;
+ }
+ if (!strncmp(token, "name=", 5)) {
const char *name = token + 5;
/* Can't specify an empty name */
if (!strlen(name))
@@ -1136,20 +1142,44 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
GFP_KERNEL);
if (!opts->name)
return -ENOMEM;
- } else {
- struct cgroup_subsys *ss;
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
- ss = subsys[i];
- if (ss == NULL)
- continue;
- if (!strcmp(token, ss->name)) {
- if (!ss->disabled)
- set_bit(i, &opts->subsys_bits);
- break;
- }
- }
- if (i == CGROUP_SUBSYS_COUNT)
- return -ENOENT;
+
+ continue;
+ }
+
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
+ if (strcmp(token, ss->name))
+ continue;
+ if (ss->disabled)
+ continue;
+
+ /* Mutually exclusive option 'all' + subsystem name */
+ if (all_ss)
+ return -EINVAL;
+ set_bit(i, &opts->subsys_bits);
+ one_ss = true;
+
+ break;
+ }
+ if (i == CGROUP_SUBSYS_COUNT)
+ return -ENOENT;
+ }
+
+ /*
+ * If the 'all' option was specified select all the subsystems,
+ * otherwise 'all, 'none' and a subsystem name options were not
+ * specified, let's default to 'all'
+ */
+ if (all_ss || (!all_ss && !one_ss && !opts->none)) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
+ if (ss->disabled)
+ continue;
+ set_bit(i, &opts->subsys_bits);
}
}
--
1.7.0.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/3][V2] cgroup : remove the ns_cgroup
[not found] ` <1285582453-6127-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2010-09-27 10:14 ` [PATCH 1/3][V2] cgroup : add clone_children control file Daniel Lezcano
2010-09-27 10:14 ` [PATCH 2/3][V2] cgroup : make the mount options parsing more accurate Daniel Lezcano
@ 2010-09-27 10:14 ` Daniel Lezcano
2010-09-27 19:57 ` [PATCH 0/3][V2] " Andrew Morton
3 siblings, 0 replies; 17+ messages in thread
From: Daniel Lezcano @ 2010-09-27 10:14 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman
The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier.
For example, a single process can not handle a big amount of namespaces
without interacting with this cgroup and falling in an exponential creation
time due to the nested cgroup directory depth (eg. /cgroup/<pid>/.../<pid>/...).
That was spotted when creating a single process using multiple network namespaces,
the objective was 4096 network namespaces, but at 820 netns, the creation time
was dramatically slow and the creation time for a namespace increased from 10msec
to 10sec. After five hours, the expected numbers of netns was not reached.
Without the ns_cgroup interaction, 4K netns are created after 2 minutes.
In order to solve that, we have to mount the cgroup with all the subsystems
except the ns_cgroup, it's a little weird and hard to manage from an administration
pov because we have to know what are the cgroup available on the system and we
can't do a simple 'mount -t cgroup cgroup /cgroup'.
With the previous patch which adds a 'clone_children' parameter to a cgroup,
we should be able to remove the ns_cgroup and manage manually the creation +
adding a task to the cgroup consistenly with the rest of the subsystems.
This patch removes the ns_cgroup as suggested in the following thread:
https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
The 'cgroup_clone' function is removed because it is no longer used.
Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Cc: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Jamal Hadi Salim <hadi-fAAogVwAN2Kw5LPnMra/2Q@public.gmane.org>
Reviewed-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Acked-by: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
Documentation/cgroups/cgroups.txt | 2 +-
arch/arm/configs/tegra_defconfig | 1 -
arch/mips/configs/bcm47xx_defconfig | 1 -
arch/powerpc/configs/ppc6xx_defconfig | 1 -
arch/powerpc/configs/pseries_defconfig | 1 -
arch/s390/defconfig | 1 -
arch/sh/configs/sdk7786_defconfig | 1 -
arch/sh/configs/se7206_defconfig | 1 -
arch/sh/configs/shx3_defconfig | 1 -
arch/sh/configs/urquell_defconfig | 1 -
arch/x86/configs/i386_defconfig | 1 -
arch/x86/configs/x86_64_defconfig | 1 -
include/linux/cgroup.h | 3 -
include/linux/cgroup_subsys.h | 6 --
include/linux/nsproxy.h | 9 ---
init/Kconfig | 9 ---
kernel/Makefile | 1 -
kernel/cgroup.c | 116 --------------------------------
kernel/cpuset.c | 7 +-
kernel/fork.c | 6 --
kernel/ns_cgroup.c | 110 ------------------------------
kernel/nsproxy.c | 4 -
22 files changed, 4 insertions(+), 280 deletions(-)
delete mode 100644 kernel/ns_cgroup.c
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 190018b..6a5ba63 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -618,7 +618,7 @@ always handled well.
void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
(cgroup_mutex held by caller)
-Called at the end of cgroup_clone() to do any parameter
+Called during cgroup_create() to do any parameter
initialization which might be required before a task could attach. For
example in cpusets, no task may attach before 'cpus' and 'mems' are set
up.
diff --git a/arch/arm/configs/tegra_defconfig b/arch/arm/configs/tegra_defconfig
index c81b6d9..ebb8c55 100644
--- a/arch/arm/configs/tegra_defconfig
+++ b/arch/arm/configs/tegra_defconfig
@@ -65,7 +65,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
-# CONFIG_CGROUP_NS is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
diff --git a/arch/mips/configs/bcm47xx_defconfig b/arch/mips/configs/bcm47xx_defconfig
index 927d58b..c4338e0 100644
--- a/arch/mips/configs/bcm47xx_defconfig
+++ b/arch/mips/configs/bcm47xx_defconfig
@@ -16,7 +16,6 @@ CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_AUDIT=y
CONFIG_TINY_RCU=y
CONFIG_CGROUPS=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
diff --git a/arch/powerpc/configs/ppc6xx_defconfig b/arch/powerpc/configs/ppc6xx_defconfig
index 9d64a68..9b253f6 100644
--- a/arch/powerpc/configs/ppc6xx_defconfig
+++ b/arch/powerpc/configs/ppc6xx_defconfig
@@ -10,7 +10,6 @@ CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_AUDIT=y
CONFIG_CGROUPS=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
index f87f0e1..972587f 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -15,7 +15,6 @@ CONFIG_AUDITSYSCALL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CGROUPS=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
diff --git a/arch/s390/defconfig b/arch/s390/defconfig
index e40ac6e..4b6d1a1 100644
--- a/arch/s390/defconfig
+++ b/arch/s390/defconfig
@@ -5,7 +5,6 @@ CONFIG_AUDIT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CGROUPS=y
-CONFIG_CGROUP_NS=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
diff --git a/arch/sh/configs/sdk7786_defconfig b/arch/sh/configs/sdk7786_defconfig
index dc4a2eb..9fdabe2 100644
--- a/arch/sh/configs/sdk7786_defconfig
+++ b/arch/sh/configs/sdk7786_defconfig
@@ -12,7 +12,6 @@ CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
index a468ff2..72c3fad 100644
--- a/arch/sh/configs/se7206_defconfig
+++ b/arch/sh/configs/se7206_defconfig
@@ -8,7 +8,6 @@ CONFIG_RCU_TRACE=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
index 3f92d37..6bb4130 100644
--- a/arch/sh/configs/shx3_defconfig
+++ b/arch/sh/configs/shx3_defconfig
@@ -9,7 +9,6 @@ CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
diff --git a/arch/sh/configs/urquell_defconfig b/arch/sh/configs/urquell_defconfig
index 7b3daec..8bfa4d0 100644
--- a/arch/sh/configs/urquell_defconfig
+++ b/arch/sh/configs/urquell_defconfig
@@ -9,7 +9,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index 6f98726..2bf1805 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -10,7 +10,6 @@ CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_AUDIT=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_CGROUPS=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_CPUACCT=y
diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig
index ee01a9d..22a0dc8 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -11,7 +11,6 @@ CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_AUDIT=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_CGROUPS=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_CPUACCT=y
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed4ba11..4170663 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -552,9 +552,6 @@ static inline struct cgroup* task_cgroup(struct task_struct *task,
return task_subsys_state(task, subsys_id)->cgroup;
}
-int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *ss,
- char *nodename);
-
/* A cgroup_iter should be treated as an opaque object */
struct cgroup_iter {
struct list_head *cg_link;
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index ccefff0..4ba5259 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -19,12 +19,6 @@ SUBSYS(debug)
/* */
-#ifdef CONFIG_CGROUP_NS
-SUBSYS(ns)
-#endif
-
-/* */
-
#ifdef CONFIG_CGROUP_SCHED
SUBSYS(cpu_cgroup)
#endif
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 7b370c7..50d20ab 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -81,13 +81,4 @@ static inline void get_nsproxy(struct nsproxy *ns)
atomic_inc(&ns->count);
}
-#ifdef CONFIG_CGROUP_NS
-int ns_cgroup_clone(struct task_struct *tsk, struct pid *pid);
-#else
-static inline int ns_cgroup_clone(struct task_struct *tsk, struct pid *pid)
-{
- return 0;
-}
-#endif
-
#endif
diff --git a/init/Kconfig b/init/Kconfig
index 0859284..913c379 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -518,15 +518,6 @@ config CGROUP_DEBUG
Say N if unsure.
-config CGROUP_NS
- bool "Namespace cgroup subsystem"
- depends on CGROUPS
- help
- Provides a simple namespace cgroup subsystem to
- provide hierarchical naming of sets of namespaces,
- for instance virtual servers and checkpoint/restart
- jobs.
-
config CGROUP_FREEZER
bool "Freezer cgroup subsystem"
depends on CGROUPS
diff --git a/kernel/Makefile b/kernel/Makefile
index 92b7420..a390a76 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -61,7 +61,6 @@ obj-$(CONFIG_COMPAT) += compat.o
obj-$(CONFIG_CGROUPS) += cgroup.o
obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
obj-$(CONFIG_CPUSETS) += cpuset.o
-obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
obj-$(CONFIG_UTS_NS) += utsname.o
obj-$(CONFIG_USER_NS) += user_namespace.o
obj-$(CONFIG_PID_NS) += pid_namespace.o
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 9eace43..dd0210f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4242,122 +4242,6 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
}
/**
- * cgroup_clone - clone the cgroup the given subsystem is attached to
- * @tsk: the task to be moved
- * @subsys: the given subsystem
- * @nodename: the name for the new cgroup
- *
- * Duplicate the current cgroup in the hierarchy that the given
- * subsystem is attached to, and move this task into the new
- * child.
- */
-int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *subsys,
- char *nodename)
-{
- struct dentry *dentry;
- int ret = 0;
- struct cgroup *parent, *child;
- struct inode *inode;
- struct css_set *cg;
- struct cgroupfs_root *root;
- struct cgroup_subsys *ss;
-
- /* We shouldn't be called by an unregistered subsystem */
- BUG_ON(!subsys->active);
-
- /* First figure out what hierarchy and cgroup we're dealing
- * with, and pin them so we can drop cgroup_mutex */
- mutex_lock(&cgroup_mutex);
- again:
- root = subsys->root;
- if (root == &rootnode) {
- mutex_unlock(&cgroup_mutex);
- return 0;
- }
-
- /* Pin the hierarchy */
- if (!atomic_inc_not_zero(&root->sb->s_active)) {
- /* We race with the final deactivate_super() */
- mutex_unlock(&cgroup_mutex);
- return 0;
- }
-
- /* Keep the cgroup alive */
- task_lock(tsk);
- parent = task_cgroup(tsk, subsys->subsys_id);
- cg = tsk->cgroups;
- get_css_set(cg);
- task_unlock(tsk);
-
- mutex_unlock(&cgroup_mutex);
-
- /* Now do the VFS work to create a cgroup */
- inode = parent->dentry->d_inode;
-
- /* Hold the parent directory mutex across this operation to
- * stop anyone else deleting the new cgroup */
- mutex_lock(&inode->i_mutex);
- dentry = lookup_one_len(nodename, parent->dentry, strlen(nodename));
- if (IS_ERR(dentry)) {
- printk(KERN_INFO
- "cgroup: Couldn't allocate dentry for %s: %ld\n", nodename,
- PTR_ERR(dentry));
- ret = PTR_ERR(dentry);
- goto out_release;
- }
-
- /* Create the cgroup directory, which also creates the cgroup */
- ret = vfs_mkdir(inode, dentry, 0755);
- child = __d_cgrp(dentry);
- dput(dentry);
- if (ret) {
- printk(KERN_INFO
- "Failed to create cgroup %s: %d\n", nodename,
- ret);
- goto out_release;
- }
-
- /* The cgroup now exists. Retake cgroup_mutex and check
- * that we're still in the same state that we thought we
- * were. */
- mutex_lock(&cgroup_mutex);
- if ((root != subsys->root) ||
- (parent != task_cgroup(tsk, subsys->subsys_id))) {
- /* Aargh, we raced ... */
- mutex_unlock(&inode->i_mutex);
- put_css_set(cg);
-
- deactivate_super(root->sb);
- /* The cgroup is still accessible in the VFS, but
- * we're not going to try to rmdir() it at this
- * point. */
- printk(KERN_INFO
- "Race in cgroup_clone() - leaking cgroup %s\n",
- nodename);
- goto again;
- }
-
- /* do any required auto-setup */
- for_each_subsys(root, ss) {
- if (ss->post_clone)
- ss->post_clone(ss, child);
- }
-
- /* All seems fine. Finish by moving the task into the new cgroup */
- ret = cgroup_attach_task(child, tsk);
- mutex_unlock(&cgroup_mutex);
-
- out_release:
- mutex_unlock(&inode->i_mutex);
-
- mutex_lock(&cgroup_mutex);
- put_css_set(cg);
- mutex_unlock(&cgroup_mutex);
- deactivate_super(root->sb);
- return ret;
-}
-
-/**
* cgroup_is_descendant - see if @cgrp is a descendant of @task's cgrp
* @cgrp: the cgroup in question
* @task: the task in question
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b23c097..1309fe0 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1829,10 +1829,9 @@ static int cpuset_populate(struct cgroup_subsys *ss, struct cgroup *cont)
}
/*
- * post_clone() is called at the end of cgroup_clone().
- * 'cgroup' was just created automatically as a result of
- * a cgroup_clone(), and the current task is about to
- * be moved into 'cgroup'.
+ * post_clone() is called during cgroup_create() when the
+ * clone_children mount argument was specified. The cgroup
+ * can not yet have any tasks.
*
* Currently we refuse to set up the cgroup - thereby
* refusing the task to be entered, and as a result refusing
diff --git a/kernel/fork.c b/kernel/fork.c
index c445f8c..623b9c1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1171,12 +1171,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
if (clone_flags & CLONE_THREAD)
p->tgid = current->tgid;
- if (current->nsproxy != p->nsproxy) {
- retval = ns_cgroup_clone(p, pid);
- if (retval)
- goto bad_fork_free_pid;
- }
-
p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
/*
* Clear TID on mm_release()?
diff --git a/kernel/ns_cgroup.c b/kernel/ns_cgroup.c
deleted file mode 100644
index 2a5dfec..0000000
--- a/kernel/ns_cgroup.c
+++ /dev/null
@@ -1,110 +0,0 @@
-/*
- * ns_cgroup.c - namespace cgroup subsystem
- *
- * Copyright 2006, 2007 IBM Corp
- */
-
-#include <linux/module.h>
-#include <linux/cgroup.h>
-#include <linux/fs.h>
-#include <linux/proc_fs.h>
-#include <linux/slab.h>
-#include <linux/nsproxy.h>
-
-struct ns_cgroup {
- struct cgroup_subsys_state css;
-};
-
-struct cgroup_subsys ns_subsys;
-
-static inline struct ns_cgroup *cgroup_to_ns(
- struct cgroup *cgroup)
-{
- return container_of(cgroup_subsys_state(cgroup, ns_subsys_id),
- struct ns_cgroup, css);
-}
-
-int ns_cgroup_clone(struct task_struct *task, struct pid *pid)
-{
- char name[PROC_NUMBUF];
-
- snprintf(name, PROC_NUMBUF, "%d", pid_vnr(pid));
- return cgroup_clone(task, &ns_subsys, name);
-}
-
-/*
- * Rules:
- * 1. you can only enter a cgroup which is a descendant of your current
- * cgroup
- * 2. you can only place another process into a cgroup if
- * a. you have CAP_SYS_ADMIN
- * b. your cgroup is an ancestor of task's destination cgroup
- * (hence either you are in the same cgroup as task, or in an
- * ancestor cgroup thereof)
- */
-static int ns_can_attach(struct cgroup_subsys *ss, struct cgroup *new_cgroup,
- struct task_struct *task, bool threadgroup)
-{
- if (current != task) {
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
-
- if (!cgroup_is_descendant(new_cgroup, current))
- return -EPERM;
- }
-
- if (!cgroup_is_descendant(new_cgroup, task))
- return -EPERM;
-
- if (threadgroup) {
- struct task_struct *c;
- rcu_read_lock();
- list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
- if (!cgroup_is_descendant(new_cgroup, c)) {
- rcu_read_unlock();
- return -EPERM;
- }
- }
- rcu_read_unlock();
- }
-
- return 0;
-}
-
-/*
- * Rules: you can only create a cgroup if
- * 1. you are capable(CAP_SYS_ADMIN)
- * 2. the target cgroup is a descendant of your own cgroup
- */
-static struct cgroup_subsys_state *ns_create(struct cgroup_subsys *ss,
- struct cgroup *cgroup)
-{
- struct ns_cgroup *ns_cgroup;
-
- if (!capable(CAP_SYS_ADMIN))
- return ERR_PTR(-EPERM);
- if (!cgroup_is_descendant(cgroup, current))
- return ERR_PTR(-EPERM);
-
- ns_cgroup = kzalloc(sizeof(*ns_cgroup), GFP_KERNEL);
- if (!ns_cgroup)
- return ERR_PTR(-ENOMEM);
- return &ns_cgroup->css;
-}
-
-static void ns_destroy(struct cgroup_subsys *ss,
- struct cgroup *cgroup)
-{
- struct ns_cgroup *ns_cgroup;
-
- ns_cgroup = cgroup_to_ns(cgroup);
- kfree(ns_cgroup);
-}
-
-struct cgroup_subsys ns_subsys = {
- .name = "ns",
- .can_attach = ns_can_attach,
- .create = ns_create,
- .destroy = ns_destroy,
- .subsys_id = ns_subsys_id,
-};
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f74e6c0..014a90d 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -198,10 +198,6 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
goto out;
}
- err = ns_cgroup_clone(current, task_pid(current));
- if (err)
- put_nsproxy(*new_nsp);
-
out:
return err;
}
--
1.7.0.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3][V2] cgroup : add clone_children control file
[not found] ` <1285582453-6127-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
@ 2010-09-27 13:34 ` Serge E. Hallyn
2010-09-27 14:11 ` Balbir Singh
2010-09-29 23:12 ` Paul Menage
2 siblings, 0 replies; 17+ messages in thread
From: Serge E. Hallyn @ 2010-09-27 13:34 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Serge E. Hallyn, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
Paul Menage,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman
Quoting Daniel Lezcano (daniel.lezcano-GANU6spQydw@public.gmane.org):
> This patch is sent as an answer to a previous thread around the ns_cgroup.
>
> https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html
>
> It adds a control file 'clone_children' for a cgroup.
> This control file is a boolean specifying if the child cgroup should
> be a clone of the parent cgroup or not. The default value is 'false'.
>
> This flag makes the child cgroup to call the post_clone callback of all
> the subsystem, if it is available.
>
> At present, the cpuset is the only one which had implemented the post_clone
> callback.
>
> The option can be set at mount time by specifying the 'clone_children' mount
> option.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
> Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Cc: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Paul and Eric, do you have any objections to this set? Patch 2 in
particular will make it easier to use both libvirt-lxc and lxc.sf.net
containers at the same time, without having to reboot.
> Reviewed-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
thanks,
-serge
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3][V2] cgroup : add clone_children control file
[not found] ` <1285582453-6127-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2010-09-27 13:34 ` Serge E. Hallyn
@ 2010-09-27 14:11 ` Balbir Singh
2010-09-29 23:12 ` Paul Menage
2 siblings, 0 replies; 17+ messages in thread
From: Balbir Singh @ 2010-09-27 14:11 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Serge E. Hallyn, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
Paul Menage,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman
On Mon, Sep 27, 2010 at 3:44 PM, Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> wrote:
>
> This patch is sent as an answer to a previous thread around the ns_cgroup.
>
> https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html
>
> It adds a control file 'clone_children' for a cgroup.
> This control file is a boolean specifying if the child cgroup should
> be a clone of the parent cgroup or not. The default value is 'false'.
>
> This flag makes the child cgroup to call the post_clone callback of all
> the subsystem, if it is available.
>
> At present, the cpuset is the only one which had implemented the post_clone
> callback.
>
> The option can be set at mount time by specifying the 'clone_children' mount
> option.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
> Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
This feature is very useful, thanks for looking into this. The only
comment I have is that clone says, it clones values, actually it
provides the opportunity for cgroup controllers to do so or anything
else after create succeeds.
Acked-by: Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <1285582453-6127-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
` (2 preceding siblings ...)
2010-09-27 10:14 ` [PATCH 3/3][V2] cgroup : remove the ns_cgroup Daniel Lezcano
@ 2010-09-27 19:57 ` Andrew Morton
[not found] ` <20100927125741.0df22f09.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
3 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-09-27 19:57 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Paul Menage, Eric W. Biederman
On Mon, 27 Sep 2010 12:14:10 +0200
Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> wrote:
> The ns_cgroup is a control group interacting with the namespaces.
> When a new namespace is created, a corresponding cgroup is
> automatically created too. The cgroup name is the pid of the process
> who did 'unshare' or the child of 'clone'.
>
> This cgroup is tied with the namespace because it prevents a
> process to escape the control group and use the post_clone callback,
> so the child cgroup inherits the values of the parent cgroup.
>
> Unfortunately, the more we use this cgroup and the more we are facing
> problems with it:
>
> (1) when a process unshares, the cgroup name may conflict with a previous
> cgroup with the same pid, so unshare or clone return -EEXIST
>
> (2) the cgroup creation is out of control because there may have an
> application creating several namespaces where the system will automatically
> create several cgroups in his back and let them on the cgroupfs (eg. a vrf
> based on the network namespace).
>
> (3) the mix of (1) and (2) force an administrator to regularly check and
> clean these cgroups.
>
> This patchset removes the ns_cgroup by adding a new flag to the cgroup
> and the cgroupfs mount option. It enables the copy of the parent cgroup
> when a child cgroup is created. We can then safely remove the ns_cgroup as
> this flag brings a compatibility. We have now to manually create and add the
> task to a cgroup, which is consistent with the cgroup framework.
So this is a non-backward-compatible userspace-visible change?
What are the implications of this?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <20100927125741.0df22f09.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2010-09-27 20:36 ` Serge E. Hallyn
[not found] ` <20100927203658.GA5320-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-27 20:43 ` Daniel Lezcano
1 sibling, 1 reply; 17+ messages in thread
From: Serge E. Hallyn @ 2010-09-27 20:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman, Paul Menage
Quoting Andrew Morton (akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org):
> On Mon, 27 Sep 2010 12:14:10 +0200
> Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> wrote:
>
> > The ns_cgroup is a control group interacting with the namespaces.
> > When a new namespace is created, a corresponding cgroup is
> > automatically created too. The cgroup name is the pid of the process
> > who did 'unshare' or the child of 'clone'.
> >
> > This cgroup is tied with the namespace because it prevents a
> > process to escape the control group and use the post_clone callback,
> > so the child cgroup inherits the values of the parent cgroup.
> >
> > Unfortunately, the more we use this cgroup and the more we are facing
> > problems with it:
> >
> > (1) when a process unshares, the cgroup name may conflict with a previous
> > cgroup with the same pid, so unshare or clone return -EEXIST
> >
> > (2) the cgroup creation is out of control because there may have an
> > application creating several namespaces where the system will automatically
> > create several cgroups in his back and let them on the cgroupfs (eg. a vrf
> > based on the network namespace).
> >
> > (3) the mix of (1) and (2) force an administrator to regularly check and
> > clean these cgroups.
> >
> > This patchset removes the ns_cgroup by adding a new flag to the cgroup
> > and the cgroupfs mount option. It enables the copy of the parent cgroup
> > when a child cgroup is created. We can then safely remove the ns_cgroup as
> > this flag brings a compatibility. We have now to manually create and add the
> > task to a cgroup, which is consistent with the cgroup framework.
>
> So this is a non-backward-compatible userspace-visible change?
Yes, it is.
Patch 1 is needed to let lxc and libvirt both control containers with
same cgroup setup. Patch 3 however isn't *necessary* for that. Daniel,
what do you think about holding off on patch 3?
> What are the implications of this?
The ns cgroup does 2 things which no other cgroup does: (1) it
moves tasks into a child cgroup any time they unshare or clone
a namespace. And (2) it prevents them from moving up to a parent
cgroup. The latter in particular makes it the only way, without
using an LSM, of locking root into a cgroup, until user namespaces
are further developed (*).
-serge
(*) - Maybe something to add to that new kernel todo list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <20100927125741.0df22f09.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-09-27 20:36 ` Serge E. Hallyn
@ 2010-09-27 20:43 ` Daniel Lezcano
1 sibling, 0 replies; 17+ messages in thread
From: Daniel Lezcano @ 2010-09-27 20:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Paul Menage, Eric W. Biederman
On 09/27/2010 09:57 PM, Andrew Morton wrote:
> On Mon, 27 Sep 2010 12:14:10 +0200
> Daniel Lezcano<daniel.lezcano-GANU6spQydw@public.gmane.org> wrote:
>
>
>> The ns_cgroup is a control group interacting with the namespaces.
>> When a new namespace is created, a corresponding cgroup is
>> automatically created too. The cgroup name is the pid of the process
>> who did 'unshare' or the child of 'clone'.
>>
>> This cgroup is tied with the namespace because it prevents a
>> process to escape the control group and use the post_clone callback,
>> so the child cgroup inherits the values of the parent cgroup.
>>
>> Unfortunately, the more we use this cgroup and the more we are facing
>> problems with it:
>>
>> (1) when a process unshares, the cgroup name may conflict with a previous
>> cgroup with the same pid, so unshare or clone return -EEXIST
>>
>> (2) the cgroup creation is out of control because there may have an
>> application creating several namespaces where the system will automatically
>> create several cgroups in his back and let them on the cgroupfs (eg. a vrf
>> based on the network namespace).
>>
>> (3) the mix of (1) and (2) force an administrator to regularly check and
>> clean these cgroups.
>>
>> This patchset removes the ns_cgroup by adding a new flag to the cgroup
>> and the cgroupfs mount option. It enables the copy of the parent cgroup
>> when a child cgroup is created. We can then safely remove the ns_cgroup as
>> this flag brings a compatibility. We have now to manually create and add the
>> task to a cgroup, which is consistent with the cgroup framework.
>>
> So this is a non-backward-compatible userspace-visible change?
>
> What are the implications of this?
>
An application will have to create a directory in the cgroup directory
and write the pid in the tasks file, instead of assuming it is
automatically created with the unshare/clone. The cgroupfs should be
mounted with the 'clone_children' option set.
AFAIK, I am the only one, with the lxc tools, using the ns_cgroup and I
will be happy to get rid of it. People is used to change the default
cgroup mount options to mount all the subsystems except the ns_cgroup
(for example this is needed for libvirt if I am not wrong). IMHO, a very
few people will be impacted, to not say nobody.
-- Daniel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <20100927203658.GA5320-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2010-09-27 20:45 ` Eric W. Biederman
[not found] ` <m1zkv2ncex.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-09-27 20:46 ` Andrew Morton
1 sibling, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2010-09-27 20:45 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Andrew Morton, Paul Menage
"Serge E. Hallyn" <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
> Quoting Andrew Morton (akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org):
>> On Mon, 27 Sep 2010 12:14:10 +0200
>> Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> wrote:
>>
>> > The ns_cgroup is a control group interacting with the namespaces.
>> > When a new namespace is created, a corresponding cgroup is
>> > automatically created too. The cgroup name is the pid of the process
>> > who did 'unshare' or the child of 'clone'.
>> >
>> > This cgroup is tied with the namespace because it prevents a
>> > process to escape the control group and use the post_clone callback,
>> > so the child cgroup inherits the values of the parent cgroup.
>> >
>> > Unfortunately, the more we use this cgroup and the more we are facing
>> > problems with it:
>> >
>> > (1) when a process unshares, the cgroup name may conflict with a previous
>> > cgroup with the same pid, so unshare or clone return -EEXIST
>> >
>> > (2) the cgroup creation is out of control because there may have an
>> > application creating several namespaces where the system will automatically
>> > create several cgroups in his back and let them on the cgroupfs (eg. a vrf
>> > based on the network namespace).
>> >
>> > (3) the mix of (1) and (2) force an administrator to regularly check and
>> > clean these cgroups.
>> >
>> > This patchset removes the ns_cgroup by adding a new flag to the cgroup
>> > and the cgroupfs mount option. It enables the copy of the parent cgroup
>> > when a child cgroup is created. We can then safely remove the ns_cgroup as
>> > this flag brings a compatibility. We have now to manually create and add the
>> > task to a cgroup, which is consistent with the cgroup framework.
>>
>> So this is a non-backward-compatible userspace-visible change?
>
> Yes, it is.
>
We have always been able to compile out the ns cgroup right?
In which case this is not a backwards incompatible change so much
as the permanent removal of a borked kernel feature.
Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <20100927203658.GA5320-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-27 20:45 ` Eric W. Biederman
@ 2010-09-27 20:46 ` Andrew Morton
[not found] ` <20100927134619.ecefe9f4.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-09-27 20:46 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Paul Menage,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman
On Mon, 27 Sep 2010 15:36:58 -0500
"Serge E. Hallyn" <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> > > This patchset removes the ns_cgroup by adding a new flag to the cgroup
> > > and the cgroupfs mount option. It enables the copy of the parent cgroup
> > > when a child cgroup is created. We can then safely remove the ns_cgroup as
> > > this flag brings a compatibility. We have now to manually create and add the
> > > task to a cgroup, which is consistent with the cgroup framework.
> >
> > So this is a non-backward-compatible userspace-visible change?
>
> Yes, it is.
>
> Patch 1 is needed to let lxc and libvirt both control containers with
> same cgroup setup. Patch 3 however isn't *necessary* for that. Daniel,
> what do you think about holding off on patch 3?
One way of handling this would be to merge patches 1&2 which add the
new interface and also arrange for usage of the old interface(s) to
emit a printk, telling people that they're using a feature which is
scheduled for removal.
Later, we remove the old interface.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <m1zkv2ncex.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2010-09-27 20:50 ` Andrew Morton
[not found] ` <20100927135025.74e297c3.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2010-09-27 20:50 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Menage,
Paul-FOgKQjlUJ6BQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
On Mon, 27 Sep 2010 13:45:26 -0700
ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
> "Serge E. Hallyn" <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>
> > Quoting Andrew Morton (akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org):
> >> On Mon, 27 Sep 2010 12:14:10 +0200
> >> Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> wrote:
> >>
> >> > The ns_cgroup is a control group interacting with the namespaces.
> >> > When a new namespace is created, a corresponding cgroup is
> >> > automatically created too. The cgroup name is the pid of the process
> >> > who did 'unshare' or the child of 'clone'.
> >> >
> >> > This cgroup is tied with the namespace because it prevents a
> >> > process to escape the control group and use the post_clone callback,
> >> > so the child cgroup inherits the values of the parent cgroup.
> >> >
> >> > Unfortunately, the more we use this cgroup and the more we are facing
> >> > problems with it:
> >> >
> >> > (1) when a process unshares, the cgroup name may conflict with a previous
> >> > cgroup with the same pid, so unshare or clone return -EEXIST
> >> >
> >> > (2) the cgroup creation is out of control because there may have an
> >> > application creating several namespaces where the system will automatically
> >> > create several cgroups in his back and let them on the cgroupfs (eg. a vrf
> >> > based on the network namespace).
> >> >
> >> > (3) the mix of (1) and (2) force an administrator to regularly check and
> >> > clean these cgroups.
> >> >
> >> > This patchset removes the ns_cgroup by adding a new flag to the cgroup
> >> > and the cgroupfs mount option. It enables the copy of the parent cgroup
> >> > when a child cgroup is created. We can then safely remove the ns_cgroup as
> >> > this flag brings a compatibility. We have now to manually create and add the
> >> > task to a cgroup, which is consistent with the cgroup framework.
> >>
> >> So this is a non-backward-compatible userspace-visible change?
> >
> > Yes, it is.
> >
>
> We have always been able to compile out the ns cgroup right?
>
> In which case this is not a backwards incompatible change so much
> as the permanent removal of a borked kernel feature.
>
That's just spin ;) People whose code is dependent on the old feature
get screwed over if we remove it.
And sure, it's unlikely that many people at all are using this feature.
But it's amazing what goes on out there and if the cost to us of
providing people with a migration period isn't too high then why not do
it?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <20100927134619.ecefe9f4.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2010-09-27 20:53 ` Serge E. Hallyn
2010-09-28 13:50 ` Daniel Lezcano
1 sibling, 0 replies; 17+ messages in thread
From: Serge E. Hallyn @ 2010-09-27 20:53 UTC (permalink / raw)
To: Andrew Morton
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman, Paul Menage
Quoting Andrew Morton (akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org):
> On Mon, 27 Sep 2010 15:36:58 -0500
> "Serge E. Hallyn" <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>
> > > > This patchset removes the ns_cgroup by adding a new flag to the cgroup
> > > > and the cgroupfs mount option. It enables the copy of the parent cgroup
> > > > when a child cgroup is created. We can then safely remove the ns_cgroup as
> > > > this flag brings a compatibility. We have now to manually create and add the
> > > > task to a cgroup, which is consistent with the cgroup framework.
> > >
> > > So this is a non-backward-compatible userspace-visible change?
> >
> > Yes, it is.
> >
> > Patch 1 is needed to let lxc and libvirt both control containers with
> > same cgroup setup. Patch 3 however isn't *necessary* for that. Daniel,
> > what do you think about holding off on patch 3?
>
> One way of handling this would be to merge patches 1&2 which add the
> new interface and also arrange for usage of the old interface(s) to
> emit a printk, telling people that they're using a feature which is
> scheduled for removal.
>
> Later, we remove the old interface.
I see no downside to that.
thanks,
-serge
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <20100927135025.74e297c3.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2010-09-27 21:13 ` Eric W. Biederman
0 siblings, 0 replies; 17+ messages in thread
From: Eric W. Biederman @ 2010-09-27 21:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Paul Menage
Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> writes:
>> We have always been able to compile out the ns cgroup right?
>>
>> In which case this is not a backwards incompatible change so much
>> as the permanent removal of a borked kernel feature.
>>
>
> That's just spin ;) People whose code is dependent on the old feature
> get screwed over if we remove it.
Sure. I'm just really pointing out that there is a difference
between changing the semantics and completely removing something.
It makes the change discoverable and something you can cope with,
if you so choose.
Overall I agree with at transition period, just look at sys_sysctl. Bleh.
Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <20100927134619.ecefe9f4.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-09-27 20:53 ` Serge E. Hallyn
@ 2010-09-28 13:50 ` Daniel Lezcano
[not found] ` <4CA1F299.3000603-GANU6spQydw@public.gmane.org>
1 sibling, 1 reply; 17+ messages in thread
From: Daniel Lezcano @ 2010-09-28 13:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Paul Menage, Eric W. Biederman
On 09/27/2010 10:46 PM, Andrew Morton wrote:
> On Mon, 27 Sep 2010 15:36:58 -0500
> "Serge E. Hallyn"<serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>
>
>>>> This patchset removes the ns_cgroup by adding a new flag to the cgroup
>>>> and the cgroupfs mount option. It enables the copy of the parent cgroup
>>>> when a child cgroup is created. We can then safely remove the ns_cgroup as
>>>> this flag brings a compatibility. We have now to manually create and add the
>>>> task to a cgroup, which is consistent with the cgroup framework.
>>>>
>>> So this is a non-backward-compatible userspace-visible change?
>>>
>> Yes, it is.
>>
>> Patch 1 is needed to let lxc and libvirt both control containers with
>> same cgroup setup. Patch 3 however isn't *necessary* for that. Daniel,
>> what do you think about holding off on patch 3?
>>
> One way of handling this would be to merge patches 1&2 which add the
> new interface and also arrange for usage of the old interface(s) to
> emit a printk, telling people that they're using a feature which is
> scheduled for removal.
>
Right, that makes sense.
Do you will take the patches #1 and #2, drop the patch #3, and I send a
new patch with the printk warning ?
Or shall I resend all ?
Thanks
-- Daniel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/3][V2] remove the ns_cgroup
[not found] ` <4CA1F299.3000603-GANU6spQydw@public.gmane.org>
@ 2010-09-28 21:27 ` Andrew Morton
0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2010-09-28 21:27 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Serge E. Hallyn,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Paul Menage, Eric W. Biederman
On Tue, 28 Sep 2010 15:50:17 +0200
Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> wrote:
> On 09/27/2010 10:46 PM, Andrew Morton wrote:
> > On Mon, 27 Sep 2010 15:36:58 -0500
> > "Serge E. Hallyn"<serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> >
> >
> >>>> This patchset removes the ns_cgroup by adding a new flag to the cgroup
> >>>> and the cgroupfs mount option. It enables the copy of the parent cgroup
> >>>> when a child cgroup is created. We can then safely remove the ns_cgroup as
> >>>> this flag brings a compatibility. We have now to manually create and add the
> >>>> task to a cgroup, which is consistent with the cgroup framework.
> >>>>
> >>> So this is a non-backward-compatible userspace-visible change?
> >>>
> >> Yes, it is.
> >>
> >> Patch 1 is needed to let lxc and libvirt both control containers with
> >> same cgroup setup. Patch 3 however isn't *necessary* for that. Daniel,
> >> what do you think about holding off on patch 3?
> >>
> > One way of handling this would be to merge patches 1&2 which add the
> > new interface and also arrange for usage of the old interface(s) to
> > emit a printk, telling people that they're using a feature which is
> > scheduled for removal.
> >
>
> Right, that makes sense.
>
> Do you will take the patches #1 and #2, drop the patch #3, and I send a
> new patch with the printk warning ?
> Or shall I resend all ?
I dropped #3. Please send the printk-warning patch. I'd suggest a
printk_once(), nice and verbose.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3][V2] cgroup : add clone_children control file
[not found] ` <1285582453-6127-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2010-09-27 13:34 ` Serge E. Hallyn
2010-09-27 14:11 ` Balbir Singh
@ 2010-09-29 23:12 ` Paul Menage
2 siblings, 0 replies; 17+ messages in thread
From: Paul Menage @ 2010-09-29 23:12 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Serge E. Hallyn, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Eric W. Biederman
On Mon, Sep 27, 2010 at 3:14 AM, Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org> wrote:
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
> Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Cc: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Acked-by: Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
> Documentation/cgroups/cgroups.txt | 14 +++++++++++-
> include/linux/cgroup.h | 4 +++
> kernel/cgroup.c | 39 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 55 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
> index b34823f..190018b 100644
> --- a/Documentation/cgroups/cgroups.txt
> +++ b/Documentation/cgroups/cgroups.txt
> @@ -18,7 +18,8 @@ CONTENTS:
> 1.2 Why are cgroups needed ?
> 1.3 How are cgroups implemented ?
> 1.4 What does notify_on_release do ?
> - 1.5 How do I use cgroups ?
> + 1.5 What does clone_children do ?
> + 1.6 How do I use cgroups ?
> 2. Usage Examples and Syntax
> 2.1 Basic Usage
> 2.2 Attaching processes
> @@ -293,7 +294,16 @@ notify_on_release in the root cgroup at system boot is disabled
> value of their parents notify_on_release setting. The default value of
> a cgroup hierarchy's release_agent path is empty.
>
> -1.5 How do I use cgroups ?
> +1.5 What does clone_children do ?
> +---------------------------------
> +
> +If the clone_children flag is enabled (1) in a cgroup, then all
> +cgroups created beneath will call the post_clone callbacks for each
> +subsystem of the newly created cgroup. Usually when this callback is
> +implemented for a subsystem, it copies the values of the parent
> +subsystem, this is the case for the cpuset.
> +
> +1.6 How do I use cgroups ?
> --------------------------
>
> To start a new job that is to be contained within a cgroup, using
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 709dfb9..ed4ba11 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -154,6 +154,10 @@ enum {
> * A thread in rmdir() is wating for this cgroup.
> */
> CGRP_WAIT_ON_RMDIR,
> + /*
> + * Clone cgroup values when creating a new child cgroup
> + */
> + CGRP_CLONE_CHILDREN,
> };
>
> /* which pidlist file are we talking about? */
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 7b69b8d..7b17c3e 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -243,6 +243,11 @@ static int notify_on_release(const struct cgroup *cgrp)
> return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
> }
>
> +static int clone_children(const struct cgroup *cgrp)
> +{
> + return test_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
> +}
> +
> /*
> * for_each_subsys() allows you to iterate on each subsystem attached to
> * an active hierarchy
> @@ -1039,6 +1044,8 @@ static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
> seq_puts(seq, ",noprefix");
> if (strlen(root->release_agent_path))
> seq_printf(seq, ",release_agent=%s", root->release_agent_path);
> + if (clone_children(&root->top_cgroup))
> + seq_puts(seq, ",clone_children");
> if (strlen(root->name))
> seq_printf(seq, ",name=%s", root->name);
> mutex_unlock(&cgroup_mutex);
> @@ -1049,6 +1056,7 @@ struct cgroup_sb_opts {
> unsigned long subsys_bits;
> unsigned long flags;
> char *release_agent;
> + bool clone_children;
> char *name;
> /* User explicitly requested empty subsystem */
> bool none;
> @@ -1096,6 +1104,8 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
> opts->none = true;
> } else if (!strcmp(token, "noprefix")) {
> set_bit(ROOT_NOPREFIX, &opts->flags);
> + } else if (!strcmp(token, "clone_children")) {
> + opts->clone_children = true;
> } else if (!strncmp(token, "release_agent=", 14)) {
> /* Specifying two release agents is forbidden */
> if (opts->release_agent)
> @@ -1354,6 +1364,8 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts)
> strcpy(root->release_agent_path, opts->release_agent);
> if (opts->name)
> strcpy(root->name, opts->name);
> + if (opts->clone_children)
> + set_bit(CGRP_CLONE_CHILDREN, &root->top_cgroup.flags);
> return root;
> }
>
> @@ -3172,6 +3184,23 @@ fail:
> return ret;
> }
>
> +static u64 cgroup_clone_children_read(struct cgroup *cgrp,
> + struct cftype *cft)
> +{
> + return clone_children(cgrp);
> +}
> +
> +static int cgroup_clone_children_write(struct cgroup *cgrp,
> + struct cftype *cft,
> + u64 val)
> +{
> + if (val)
> + set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
> + else
> + clear_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
> + return 0;
> +}
> +
> /*
> * for the common functions, 'private' gives the type of file
> */
> @@ -3202,6 +3231,11 @@ static struct cftype files[] = {
> .write_string = cgroup_write_event_control,
> .mode = S_IWUGO,
> },
> + {
> + .name = "cgroup.clone_children",
> + .read_u64 = cgroup_clone_children_read,
> + .write_u64 = cgroup_clone_children_write,
> + },
> };
>
> static struct cftype cft_release_agent = {
> @@ -3331,6 +3365,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
> if (notify_on_release(parent))
> set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
>
> + if (clone_children(parent))
> + set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
> +
> for_each_subsys(root, ss) {
> struct cgroup_subsys_state *css = ss->create(ss, cgrp);
>
> @@ -3345,6 +3382,8 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
> goto err_destroy;
> }
> /* At error, ->destroy() callback has to free assigned ID. */
> + if (clone_children(parent) && ss->post_clone)
> + ss->post_clone(ss, cgrp);
> }
>
> cgroup_lock_hierarchy(root);
> --
> 1.7.0.4
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2010-09-29 23:12 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-27 10:14 [PATCH 0/3][V2] remove the ns_cgroup Daniel Lezcano
[not found] ` <1285582453-6127-1-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2010-09-27 10:14 ` [PATCH 1/3][V2] cgroup : add clone_children control file Daniel Lezcano
[not found] ` <1285582453-6127-2-git-send-email-daniel.lezcano-GANU6spQydw@public.gmane.org>
2010-09-27 13:34 ` Serge E. Hallyn
2010-09-27 14:11 ` Balbir Singh
2010-09-29 23:12 ` Paul Menage
2010-09-27 10:14 ` [PATCH 2/3][V2] cgroup : make the mount options parsing more accurate Daniel Lezcano
2010-09-27 10:14 ` [PATCH 3/3][V2] cgroup : remove the ns_cgroup Daniel Lezcano
2010-09-27 19:57 ` [PATCH 0/3][V2] " Andrew Morton
[not found] ` <20100927125741.0df22f09.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-09-27 20:36 ` Serge E. Hallyn
[not found] ` <20100927203658.GA5320-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-27 20:45 ` Eric W. Biederman
[not found] ` <m1zkv2ncex.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-09-27 20:50 ` Andrew Morton
[not found] ` <20100927135025.74e297c3.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-09-27 21:13 ` Eric W. Biederman
2010-09-27 20:46 ` Andrew Morton
[not found] ` <20100927134619.ecefe9f4.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-09-27 20:53 ` Serge E. Hallyn
2010-09-28 13:50 ` Daniel Lezcano
[not found] ` <4CA1F299.3000603-GANU6spQydw@public.gmane.org>
2010-09-28 21:27 ` Andrew Morton
2010-09-27 20:43 ` Daniel Lezcano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.