* [PATCH v4 0/4] cgroups: support for module-loadable subsystems
@ 2009-12-31 5:10 Ben Blum
2009-12-31 5:12 ` [PATCH v4 1/4] cgroups: revamp subsys array Ben Blum
` (4 more replies)
0 siblings, 5 replies; 28+ messages in thread
From: Ben Blum @ 2009-12-31 5:10 UTC (permalink / raw)
To: linux-kernel, containers, lizf, akpm, menage, bblum
[This is a revision of http://lkml.org/lkml/2009/12/21/211 ]
This patch series implements support for building, loading, and
unloading subsystems as modules, both within and outside the kernel
source tree. It provides an interface cgroup_load_subsys() and
cgroup_unload_subsys() which modular subsystems can use to register and
depart during runtime. The net_cls classifier subsystem serves as the
example for a subsystem which can be converted into a module using these
changes.
Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear. Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section
of the array is protected by cgroup_mutex.
Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.
Patch #3 adds a mechanism for unloading modular subsystems, which
includes a more advanced rework of the rudimentary reference counting
introduced in patch 2.
Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.
Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves. In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy). The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys
was the only one in the dummy cgroup (and therefore all css_sets should
be unlinked from it) - however, as all css_sets always stay attached to
the dummy cgroup anyway, these cases are ignored. Any fix that addresses
this issue should also make sure these cases are addressed in the
subsystem loading and unloading code.
-- bblum
---
Documentation/cgroups/cgroups.txt | 9
include/linux/cgroup.h | 18 +
kernel/cgroup.c | 388 +++++++++++++++++++++++++++++++++-----
net/sched/Kconfig | 5
net/sched/cls_cgroup.c | 36 ++-
5 files changed, 400 insertions(+), 56 deletions(-)
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v4 1/4] cgroups: revamp subsys array
2009-12-31 5:10 [PATCH v4 0/4] cgroups: support for module-loadable subsystems Ben Blum
@ 2009-12-31 5:12 ` Ben Blum
2009-12-31 5:13 ` [PATCH v4 2/4] cgroups: subsystem module loading interface Ben Blum
` (3 subsequent siblings)
4 siblings, 0 replies; 28+ messages in thread
From: Ben Blum @ 2009-12-31 5:12 UTC (permalink / raw)
To: linux-kernel, containers, lizf, akpm, menage, bblum
[-- Attachment #1: cgroups-revamp-subsys-array.patch --]
[-- Type: text/plain, Size: 10542 bytes --]
Make subsys[] able to be dynamically populated to support modular subsystems
From: Ben Blum <bblum@andrew.cmu.edu>
This patch reworks the way the subsys[] array is used so that subsystems can
register themselves after boot time, and enables the internals of cgroups to
be able to handle when subsystems are not present or may appear/disappear.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/cgroup.h | 10 ++++-
kernel/cgroup.c | 96 ++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 88 insertions(+), 18 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..83da43d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -39,13 +39,19 @@ extern int cgroupstats_build(struct cgroupstats *stats,
extern const struct file_operations proc_cgroup_operations;
-/* Define the enumeration of all cgroup subsystems */
+/* Define the enumeration of all builtin cgroup subsystems */
#define SUBSYS(_x) _x ## _subsys_id,
enum cgroup_subsys_id {
#include <linux/cgroup_subsys.h>
- CGROUP_SUBSYS_COUNT
+ CGROUP_BUILTIN_SUBSYS_COUNT
};
#undef SUBSYS
+/*
+ * This define indicates the maximum number of subsystems that can be loaded
+ * at once. We limit to this many since cgroupfs_root has subsys_bits to keep
+ * track of all of them.
+ */
+#define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
/* Per-subsystem/per-cgroup state maintained by the system. */
struct cgroup_subsys_state {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..402e828 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -56,10 +56,14 @@
static DEFINE_MUTEX(cgroup_mutex);
-/* Generate an array of cgroup subsystem pointers */
+/*
+ * Generate an array of cgroup subsystem pointers. At boot time, this is
+ * populated up to CGROUP_BUILTIN_SUBSYS_COUNT, and modular subsystems are
+ * registered after that. The mutable section of this array is protected by
+ * cgroup_mutex.
+ */
#define SUBSYS(_x) &_x ## _subsys,
-
-static struct cgroup_subsys *subsys[] = {
+static struct cgroup_subsys *subsys[CGROUP_SUBSYS_COUNT] = {
#include <linux/cgroup_subsys.h>
};
@@ -433,8 +437,11 @@ static struct css_set *find_existing_css_set(
struct hlist_node *node;
struct css_set *cg;
- /* Built the set of subsystem state objects that we want to
- * see in the new css_set */
+ /*
+ * Build the set of subsystem state objects that we want to see in the
+ * new css_set. while subsystems can change globally, the entries here
+ * won't change, so no need for locking.
+ */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
if (root->subsys_bits & (1UL << i)) {
/* Subsystem is in this hierarchy. So we want
@@ -869,7 +876,9 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
css_put(css);
}
-
+/*
+ * Call with cgroup_mutex held.
+ */
static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long final_bits)
{
@@ -877,6 +886,8 @@ static int rebind_subsystems(struct cgroupfs_root *root,
struct cgroup *cgrp = &root->top_cgroup;
int i;
+ BUG_ON(!mutex_is_locked(&cgroup_mutex));
+
removed_bits = root->actual_subsys_bits & ~final_bits;
added_bits = final_bits & ~root->actual_subsys_bits;
/* Check that any added subsystems are currently free */
@@ -885,6 +896,12 @@ static int rebind_subsystems(struct cgroupfs_root *root,
struct cgroup_subsys *ss = subsys[i];
if (!(bit & added_bits))
continue;
+ /*
+ * Nobody should tell us to do a subsys that doesn't exist:
+ * parse_cgroupfs_options should catch that case and refcounts
+ * ensure that subsystems won't disappear once selected.
+ */
+ BUG_ON(ss == NULL);
if (ss->root != &rootnode) {
/* Subsystem isn't free */
return -EBUSY;
@@ -904,6 +921,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long bit = 1UL << i;
if (bit & added_bits) {
/* We're binding this subsystem to this hierarchy */
+ BUG_ON(ss == NULL);
BUG_ON(cgrp->subsys[i]);
BUG_ON(!dummytop->subsys[i]);
BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
@@ -917,6 +935,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
mutex_unlock(&ss->hierarchy_mutex);
} else if (bit & removed_bits) {
/* We're removing this subsystem */
+ BUG_ON(ss == NULL);
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
mutex_lock(&ss->hierarchy_mutex);
@@ -929,6 +948,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
mutex_unlock(&ss->hierarchy_mutex);
} else if (bit & final_bits) {
/* Subsystem state should already exist */
+ BUG_ON(ss == NULL);
BUG_ON(!cgrp->subsys[i]);
} else {
/* Subsystem state shouldn't exist */
@@ -971,14 +991,18 @@ struct cgroup_sb_opts {
};
-/* Convert a hierarchy specifier into a bitmask of subsystems and
- * flags. */
+/*
+ * Convert a hierarchy specifier into a bitmask of subsystems and flags. Call
+ * with cgroup_mutex held to protect the subsys[] array.
+ */
static int parse_cgroupfs_options(char *data,
struct cgroup_sb_opts *opts)
{
char *token, *o = data ?: "all";
unsigned long mask = (unsigned long)-1;
+ BUG_ON(!mutex_is_locked(&cgroup_mutex));
+
#ifdef CONFIG_CPUSETS
mask = ~(1UL << cpuset_subsys_id);
#endif
@@ -994,6 +1018,8 @@ static int parse_cgroupfs_options(char *data,
opts->subsys_bits = 0;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
if (!ss->disabled)
opts->subsys_bits |= 1ul << i;
}
@@ -1038,6 +1064,8 @@ static int parse_cgroupfs_options(char *data,
int i;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
ss = subsys[i];
+ if (ss == NULL)
+ continue;
if (!strcmp(token, ss->name)) {
if (!ss->disabled)
set_bit(i, &opts->subsys_bits);
@@ -1291,7 +1319,9 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
struct cgroupfs_root *new_root;
/* First find the desired set of subsystems */
+ mutex_lock(&cgroup_mutex);
ret = parse_cgroupfs_options(data, &opts);
+ mutex_unlock(&cgroup_mutex);
if (ret)
goto out_err;
@@ -2878,8 +2908,14 @@ static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
/* We need to take each hierarchy_mutex in a consistent order */
int i;
+ /*
+ * No worry about a race with rebind_subsystems that might mess up the
+ * locking order, since both parties are under cgroup_mutex.
+ */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
if (ss->root == root)
mutex_lock(&ss->hierarchy_mutex);
}
@@ -2891,6 +2927,8 @@ static void cgroup_unlock_hierarchy(struct cgroupfs_root *root)
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
if (ss->root == root)
mutex_unlock(&ss->hierarchy_mutex);
}
@@ -3011,11 +3049,16 @@ static int cgroup_has_css_refs(struct cgroup *cgrp)
* synchronization other than RCU, and the subsystem linked
* list isn't RCU-safe */
int i;
+ /*
+ * We won't need to lock the subsys array, because the subsystems
+ * we're concerned about aren't going anywhere since our cgroup root
+ * has a reference on them.
+ */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
struct cgroup_subsys_state *css;
- /* Skip subsystems not in this hierarchy */
- if (ss->root != cgrp->root)
+ /* Skip subsystems not present or not in this hierarchy */
+ if (ss == NULL || ss->root != cgrp->root)
continue;
css = cgrp->subsys[ss->subsys_id];
/* When called from check_for_release() it's possible
@@ -3236,7 +3279,8 @@ int __init cgroup_init_early(void)
for (i = 0; i < CSS_SET_TABLE_SIZE; i++)
INIT_HLIST_HEAD(&css_set_table[i]);
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /* at bootup time, we don't worry about modular subsystems */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
BUG_ON(!ss->name);
@@ -3271,7 +3315,8 @@ int __init cgroup_init(void)
if (err)
return err;
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /* at bootup time, we don't worry about modular subsystems */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (!ss->early_init)
cgroup_init_subsys(ss);
@@ -3380,9 +3425,16 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)
int i;
seq_puts(m, "#subsys_name\thierarchy\tnum_cgroups\tenabled\n");
+ /*
+ * ideally we don't want subsystems moving around while we do this.
+ * cgroup_mutex is also necessary to guarantee an atomic snapshot of
+ * subsys/hierarchy state.
+ */
mutex_lock(&cgroup_mutex);
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
seq_printf(m, "%s\t%d\t%d\t%d\n",
ss->name, ss->root->hierarchy_id,
ss->root->number_of_cgroups, !ss->disabled);
@@ -3440,7 +3492,12 @@ void cgroup_fork_callbacks(struct task_struct *child)
{
if (need_forkexit_callback) {
int i;
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /*
+ * forkexit callbacks are only supported for builtin
+ * subsystems, and the builtin section of the subsys array is
+ * immutable, so we don't need to lock the subsys array here.
+ */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->fork)
ss->fork(ss, child);
@@ -3509,7 +3566,11 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
struct css_set *cg;
if (run_callbacks && need_forkexit_callback) {
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /*
+ * modular subsystems can't use callbacks, so no need to lock
+ * the subsys array
+ */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->exit)
ss->exit(ss, tsk);
@@ -3800,8 +3861,11 @@ static int __init cgroup_disable(char *str)
while ((token = strsep(&str, ",")) != NULL) {
if (!*token)
continue;
-
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /*
+ * cgroup_disable, being at boot time, can't know about module
+ * subsystems, so we don't worry about them.
+ */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (!strcmp(token, ss->name)) {
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v4 2/4] cgroups: subsystem module loading interface
2009-12-31 5:10 [PATCH v4 0/4] cgroups: support for module-loadable subsystems Ben Blum
2009-12-31 5:12 ` [PATCH v4 1/4] cgroups: revamp subsys array Ben Blum
@ 2009-12-31 5:13 ` Ben Blum
2009-12-31 5:14 ` [PATCH v4 3/4] cgroups: subsystem module unloading Ben Blum
` (2 subsequent siblings)
4 siblings, 0 replies; 28+ messages in thread
From: Ben Blum @ 2009-12-31 5:13 UTC (permalink / raw)
To: linux-kernel, containers, lizf, akpm, menage, bblum
[-- Attachment #1: Type: text/plain, Size: 2611 bytes --]
On Thu, Dec 31, 2009 at 12:10:50AM -0500, Ben Blum wrote:
> [This is a revision of http://lkml.org/lkml/2009/12/21/211 ]
>
> This patch series implements support for building, loading, and
> unloading subsystems as modules, both within and outside the kernel
> source tree. It provides an interface cgroup_load_subsys() and
> cgroup_unload_subsys() which modular subsystems can use to register and
> depart during runtime. The net_cls classifier subsystem serves as the
> example for a subsystem which can be converted into a module using these
> changes.
>
> Patch #1 sets up the subsys[] array so its contents can be dynamic as
> modules appear and (eventually) disappear. Iterations over the array are
> modified to handle when subsystems are absent, and the dynamic section
> of the array is protected by cgroup_mutex.
>
> Patch #2 implements an interface for modules to load subsystems, called
> cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
> pointer in struct cgroup_subsys.
>
> Patch #3 adds a mechanism for unloading modular subsystems, which
> includes a more advanced rework of the rudimentary reference counting
> introduced in patch 2.
>
> Patch #4 modifies the net_cls subsystem, which already had some module
> declarations, to be configurable as a module, which also serves as a
> simple proof-of-concept.
>
> Part of implementing patches 2 and 4 involved updating css pointers in
> each css_set when the module appears or leaves. In doing this, it was
> discovered that css_sets always remain linked to the dummy cgroup,
> regardless of whether or not any subsystems are actually bound to it
> (i.e., not mounted on an actual hierarchy). The subsystem loading and
> unloading code therefore should keep in mind the special cases where the
> added subsystem is the only one in the dummy cgroup (and therefore all
> css_sets need to be linked back into it) and where the removed subsys
> was the only one in the dummy cgroup (and therefore all css_sets should
> be unlinked from it) - however, as all css_sets always stay attached to
> the dummy cgroup anyway, these cases are ignored. Any fix that addresses
> this issue should also make sure these cases are addressed in the
> subsystem loading and unloading code.
>
> -- bblum
>
> ---
> Documentation/cgroups/cgroups.txt | 9
> include/linux/cgroup.h | 18 +
> kernel/cgroup.c | 388 +++++++++++++++++++++++++++++++++-----
> net/sched/Kconfig | 5
> net/sched/cls_cgroup.c | 36 ++-
> 5 files changed, 400 insertions(+), 56 deletions(-)
>
>
[-- Attachment #2: cgroups-subsys-module-interface.patch --]
[-- Type: text/plain, Size: 7310 bytes --]
Add interface between cgroups subsystem management and module loading
From: Ben Blum <bblum@andrew.cmu.edu>
This patch implements rudimentary module-loading support for cgroups - namely,
a cgroup_load_subsys (similar to cgroup_init_subsys) for use as a module
initcall, and a struct module pointer in struct cgroup_subsys.
Several functions that might be wanted by modules have had EXPORT_SYMBOL added
to them, but it's unclear exactly which functions want it and which won't.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
---
Documentation/cgroups/cgroups.txt | 4 +
include/linux/cgroup.h | 4 +
kernel/cgroup.c | 128 +++++++++++++++++++++++++++++++++++++
3 files changed, 136 insertions(+), 0 deletions(-)
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 0b33bfe..6ffcf81 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -488,6 +488,10 @@ Each subsystem should:
- add an entry in linux/cgroup_subsys.h
- define a cgroup_subsys object called <name>_subsys
+If a subsystem can be compiled as a module, it should also have in its
+module initcall a call to cgroup_load_subsys(&its_subsys_struct). It
+should also set its_subsys.module = THIS_MODULE in its .c file.
+
Each subsystem may export the following methods. The only mandatory
methods are create/destroy. Any others that are null are presumed to
be successful no-ops.
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 83da43d..9461aed 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -36,6 +36,7 @@ extern void cgroup_post_fork(struct task_struct *p);
extern void cgroup_exit(struct task_struct *p, int run_callbacks);
extern int cgroupstats_build(struct cgroupstats *stats,
struct dentry *dentry);
+extern int cgroup_load_subsys(struct cgroup_subsys *ss);
extern const struct file_operations proc_cgroup_operations;
@@ -477,6 +478,9 @@ struct cgroup_subsys {
/* used when use_id == true */
struct idr idr;
spinlock_t id_lock;
+
+ /* should be defined only by modular subsystems */
+ struct module *module;
};
#define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 402e828..d7ca4cf 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -43,6 +43,7 @@
#include <linux/string.h>
#include <linux/sort.h>
#include <linux/kmod.h>
+#include <linux/module.h>
#include <linux/delayacct.h>
#include <linux/cgroupstats.h>
#include <linux/hash.h>
@@ -2084,6 +2085,7 @@ int cgroup_add_file(struct cgroup *cgrp,
error = PTR_ERR(dentry);
return error;
}
+EXPORT_SYMBOL_GPL(cgroup_add_file);
int cgroup_add_files(struct cgroup *cgrp,
struct cgroup_subsys *subsys,
@@ -2098,6 +2100,7 @@ int cgroup_add_files(struct cgroup *cgrp,
}
return 0;
}
+EXPORT_SYMBOL_GPL(cgroup_add_files);
/**
* cgroup_task_count - count the number of tasks in a cgroup.
@@ -3249,7 +3252,132 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
mutex_init(&ss->hierarchy_mutex);
lockdep_set_class(&ss->hierarchy_mutex, &ss->subsys_key);
ss->active = 1;
+
+ /* this function shouldn't be used with modular subsystems, since they
+ * need to register a subsys_id, among other things */
+ BUG_ON(ss->module);
+}
+
+/**
+ * cgroup_load_subsys: load and register a modular subsystem at runtime
+ * @ss: the subsystem to load
+ *
+ * This function should be called in a modular subsystem's initcall. If the
+ * subsytem is built as a module, it will be assigned a new subsys_id and set
+ * up for use. If the subsystem is built-in anyway, work is delegated to the
+ * simpler cgroup_init_subsys.
+ */
+int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
+{
+ int i;
+ struct cgroup_subsys_state *css;
+
+ /* check name and function validity */
+ if (ss->name == NULL || strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN ||
+ ss->create == NULL || ss->destroy == NULL)
+ return -EINVAL;
+
+ /*
+ * we don't support callbacks in modular subsystems. this check is
+ * before the ss->module check for consistency; a subsystem that could
+ * be a module should still have no callbacks even if the user isn't
+ * compiling it as one.
+ */
+ if (ss->fork || ss->exit)
+ return -EINVAL;
+
+ /*
+ * an optionally modular subsystem is built-in: we want to do nothing,
+ * since cgroup_init_subsys will have already taken care of it.
+ */
+ if (ss->module == NULL) {
+ /* a few sanity checks */
+ BUG_ON(ss->subsys_id >= CGROUP_BUILTIN_SUBSYS_COUNT);
+ BUG_ON(subsys[ss->subsys_id] != ss);
+ return 0;
+ }
+
+ /*
+ * need to register a subsys id before anything else - for example,
+ * init_cgroup_css needs it.
+ */
+ mutex_lock(&cgroup_mutex);
+ /* find the first empty slot in the array */
+ for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+ if (subsys[i] == NULL)
+ break;
+ }
+ if (i == CGROUP_SUBSYS_COUNT) {
+ /* maximum number of subsystems already registered! */
+ mutex_unlock(&cgroup_mutex);
+ return -EBUSY;
+ }
+ /* assign ourselves the subsys_id */
+ ss->subsys_id = i;
+ subsys[i] = ss;
+
+ /*
+ * no ss->create seems to need anything important in the ss struct, so
+ * this can happen first (i.e. before the rootnode attachment).
+ */
+ css = ss->create(ss, dummytop);
+ if (IS_ERR(css)) {
+ /* failure case - need to deassign the subsys[] slot. */
+ subsys[i] = NULL;
+ mutex_unlock(&cgroup_mutex);
+ return PTR_ERR(css);
+ }
+
+ list_add(&ss->sibling, &rootnode.subsys_list);
+ ss->root = &rootnode;
+
+ /* our new subsystem will be attached to the dummy hierarchy. */
+ init_cgroup_css(css, ss, dummytop);
+ /*
+ * Now we need to entangle the css into the existing css_sets. unlike
+ * in cgroup_init_subsys, there are now multiple css_sets, so each one
+ * will need a new pointer to it; done by iterating the css_set_table.
+ * furthermore, modifying the existing css_sets will corrupt the hash
+ * table state, so each changed css_set will need its hash recomputed.
+ * this is all done under the css_set_lock.
+ */
+ write_lock(&css_set_lock);
+ for (i = 0; i < CSS_SET_TABLE_SIZE; i++) {
+ struct css_set *cg;
+ struct hlist_node *node, *tmp;
+ struct hlist_head *bucket = &css_set_table[i], *new_bucket;
+
+ hlist_for_each_entry_safe(cg, node, tmp, bucket, hlist) {
+ /* skip entries that we already rehashed */
+ if (cg->subsys[ss->subsys_id])
+ continue;
+ /* remove existing entry */
+ hlist_del(&cg->hlist);
+ /* set new value */
+ cg->subsys[ss->subsys_id] = css;
+ /* recompute hash and restore entry */
+ new_bucket = css_set_hash(cg->subsys);
+ hlist_add_head(&cg->hlist, new_bucket);
+ }
+ }
+ write_unlock(&css_set_lock);
+
+ mutex_init(&ss->hierarchy_mutex);
+ lockdep_set_class(&ss->hierarchy_mutex, &ss->subsys_key);
+ ss->active = 1;
+
+ /*
+ * pin the subsystem's module so it doesn't go away. this shouldn't
+ * fail, since the module's initcall calls us.
+ * TODO: with module unloading, move this elsewhere
+ */
+ BUG_ON(!try_module_get(ss->module));
+
+ /* success! */
+ mutex_unlock(&cgroup_mutex);
+ return 0;
}
+EXPORT_SYMBOL_GPL(cgroup_load_subsys);
/**
* cgroup_init_early - cgroup initialization at system boot
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v4 3/4] cgroups: subsystem module unloading
2009-12-31 5:10 [PATCH v4 0/4] cgroups: support for module-loadable subsystems Ben Blum
2009-12-31 5:12 ` [PATCH v4 1/4] cgroups: revamp subsys array Ben Blum
2009-12-31 5:13 ` [PATCH v4 2/4] cgroups: subsystem module loading interface Ben Blum
@ 2009-12-31 5:14 ` Ben Blum
2009-12-31 5:15 ` [PATCH v4 4/4] cgroups: net_cls as module Ben Blum
2010-01-07 0:04 ` [PATCH v4 0/4] cgroups: support for module-loadable subsystems Andrew Morton
4 siblings, 0 replies; 28+ messages in thread
From: Ben Blum @ 2009-12-31 5:14 UTC (permalink / raw)
To: linux-kernel, containers, lizf, akpm, menage, bblum
[-- Attachment #1: cgroups-subsys-unloading.patch --]
[-- Type: text/plain, Size: 11729 bytes --]
Provides support for unloading modular subsystems.
From: Ben Blum <bblum@andrew.cmu.edu>
This patch adds a new function cgroup_unload_subsys which is to be used for
removing a loaded subsystem during module deletion. Reference counting of the
subsystems' modules is moved from once (at load time) to once per attached
hierarchy (in parse_cgroupfs_options and rebind_subsystems) (i.e., 0 or 1).
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
---
Documentation/cgroups/cgroups.txt | 5 +
include/linux/cgroup.h | 4 +
kernel/cgroup.c | 164 +++++++++++++++++++++++++++++++------
3 files changed, 145 insertions(+), 28 deletions(-)
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 6ffcf81..7527bac 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -489,8 +489,9 @@ Each subsystem should:
- define a cgroup_subsys object called <name>_subsys
If a subsystem can be compiled as a module, it should also have in its
-module initcall a call to cgroup_load_subsys(&its_subsys_struct). It
-should also set its_subsys.module = THIS_MODULE in its .c file.
+module initcall a call to cgroup_load_subsys(), and in its exitcall a
+call to cgroup_unload_subsys(). It should also set its_subsys.module =
+THIS_MODULE in its .c file.
Each subsystem may export the following methods. The only mandatory
methods are create/destroy. Any others that are null are presumed to
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9461aed..9be4c22 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -37,6 +37,7 @@ extern void cgroup_exit(struct task_struct *p, int run_callbacks);
extern int cgroupstats_build(struct cgroupstats *stats,
struct dentry *dentry);
extern int cgroup_load_subsys(struct cgroup_subsys *ss);
+extern void cgroup_unload_subsys(struct cgroup_subsys *ss);
extern const struct file_operations proc_cgroup_operations;
@@ -264,7 +265,8 @@ struct css_set {
/*
* Set of subsystem states, one for each subsystem. This array
* is immutable after creation apart from the init_css_set
- * during subsystem registration (at boot time).
+ * during subsystem registration (at boot time) and modular subsystem
+ * loading/unloading.
*/
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index d7ca4cf..cc2e1f6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -878,7 +878,9 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
}
/*
- * Call with cgroup_mutex held.
+ * Call with cgroup_mutex held. Drops reference counts on modules, including
+ * any duplicate ones that parse_cgroupfs_options took. If this function
+ * returns an error, no reference counts are touched.
*/
static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long final_bits)
@@ -934,6 +936,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
if (ss->bind)
ss->bind(ss, cgrp);
mutex_unlock(&ss->hierarchy_mutex);
+ /* refcount was already taken, and we're keeping it */
} else if (bit & removed_bits) {
/* We're removing this subsystem */
BUG_ON(ss == NULL);
@@ -947,10 +950,18 @@ static int rebind_subsystems(struct cgroupfs_root *root,
subsys[i]->root = &rootnode;
list_move(&ss->sibling, &rootnode.subsys_list);
mutex_unlock(&ss->hierarchy_mutex);
+ /* subsystem is now free - drop reference on module */
+ module_put(ss->module);
} else if (bit & final_bits) {
/* Subsystem state should already exist */
BUG_ON(ss == NULL);
BUG_ON(!cgrp->subsys[i]);
+ /*
+ * a refcount was taken, but we already had one, so
+ * drop the extra reference.
+ */
+ module_put(ss->module);
+ BUG_ON(ss->module && !module_refcount(ss->module));
} else {
/* Subsystem state shouldn't exist */
BUG_ON(cgrp->subsys[i]);
@@ -994,13 +1005,16 @@ struct cgroup_sb_opts {
/*
* Convert a hierarchy specifier into a bitmask of subsystems and flags. Call
- * with cgroup_mutex held to protect the subsys[] array.
+ * with cgroup_mutex held to protect the subsys[] array. This function takes
+ * refcounts on subsystems to be used, unless it returns error, in which case
+ * no refcounts are taken.
*/
-static int parse_cgroupfs_options(char *data,
- struct cgroup_sb_opts *opts)
+static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
{
char *token, *o = data ?: "all";
unsigned long mask = (unsigned long)-1;
+ int i;
+ bool module_pin_failed = false;
BUG_ON(!mutex_is_locked(&cgroup_mutex));
@@ -1015,7 +1029,6 @@ static int parse_cgroupfs_options(char *data,
return -EINVAL;
if (!strcmp(token, "all")) {
/* Add all non-disabled subsystems */
- int i;
opts->subsys_bits = 0;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
@@ -1038,7 +1051,6 @@ static int parse_cgroupfs_options(char *data,
if (!opts->release_agent)
return -ENOMEM;
} else if (!strncmp(token, "name=", 5)) {
- int i;
const char *name = token + 5;
/* Can't specify an empty name */
if (!strlen(name))
@@ -1062,7 +1074,6 @@ static int parse_cgroupfs_options(char *data,
return -ENOMEM;
} else {
struct cgroup_subsys *ss;
- int i;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
ss = subsys[i];
if (ss == NULL)
@@ -1101,9 +1112,54 @@ static int parse_cgroupfs_options(char *data,
if (!opts->subsys_bits && !opts->name)
return -EINVAL;
+ /*
+ * Grab references on all the modules we'll need, so the subsystems
+ * don't dance around before rebind_subsystems attaches them. This may
+ * take duplicate reference counts on a subsystem that's already used,
+ * but rebind_subsystems handles this case.
+ */
+ for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+ unsigned long bit = 1UL << i;
+
+ if (!(bit & opts->subsys_bits))
+ continue;
+ if (!try_module_get(subsys[i]->module)) {
+ module_pin_failed = true;
+ break;
+ }
+ }
+ if (module_pin_failed) {
+ /*
+ * oops, one of the modules was going away. this means that we
+ * raced with a module_delete call, and to the user this is
+ * essentially a "subsystem doesn't exist" case.
+ */
+ for (i--; i >= CGROUP_BUILTIN_SUBSYS_COUNT; i--) {
+ /* drop refcounts only on the ones we took */
+ unsigned long bit = 1UL << i;
+
+ if (!(bit & opts->subsys_bits))
+ continue;
+ module_put(subsys[i]->module);
+ }
+ return -ENOENT;
+ }
+
return 0;
}
+static void drop_parsed_module_refcounts(unsigned long subsys_bits)
+{
+ int i;
+ for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+ unsigned long bit = 1UL << i;
+
+ if (!(bit & subsys_bits))
+ continue;
+ module_put(subsys[i]->module);
+ }
+}
+
static int cgroup_remount(struct super_block *sb, int *flags, char *data)
{
int ret = 0;
@@ -1120,21 +1176,19 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
if (ret)
goto out_unlock;
- /* Don't allow flags to change at remount */
- if (opts.flags != root->flags) {
- ret = -EINVAL;
- goto out_unlock;
- }
-
- /* Don't allow name to change at remount */
- if (opts.name && strcmp(opts.name, root->name)) {
+ /* Don't allow flags or name to change at remount */
+ if (opts.flags != root->flags ||
+ (opts.name && strcmp(opts.name, root->name))) {
ret = -EINVAL;
+ drop_parsed_module_refcounts(opts.subsys_bits);
goto out_unlock;
}
ret = rebind_subsystems(root, opts.subsys_bits);
- if (ret)
+ if (ret) {
+ drop_parsed_module_refcounts(opts.subsys_bits);
goto out_unlock;
+ }
/* (re)populate subsystem files */
cgroup_populate_dir(cgrp);
@@ -1333,7 +1387,7 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
new_root = cgroup_root_from_opts(&opts);
if (IS_ERR(new_root)) {
ret = PTR_ERR(new_root);
- goto out_err;
+ goto drop_modules;
}
opts.new_root = new_root;
@@ -1342,7 +1396,7 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
if (IS_ERR(sb)) {
ret = PTR_ERR(sb);
cgroup_drop_root(opts.new_root);
- goto out_err;
+ goto drop_modules;
}
root = sb->s_fs_info;
@@ -1398,6 +1452,11 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
free_cg_links(&tmp_cg_links);
goto drop_new_super;
}
+ /*
+ * There must be no failure case after here, since rebinding
+ * takes care of subsystems' refcounts, which are explicitly
+ * dropped in the failure exit path.
+ */
/* EBUSY should be the only error here */
BUG_ON(ret);
@@ -1436,6 +1495,8 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
* any) is not needed
*/
cgroup_drop_root(opts.new_root);
+ /* no subsys rebinding, so refcounts don't change */
+ drop_parsed_module_refcounts(opts.subsys_bits);
}
simple_set_mnt(mnt, sb);
@@ -1445,6 +1506,8 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
drop_new_super:
deactivate_locked_super(sb);
+ drop_modules:
+ drop_parsed_module_refcounts(opts.subsys_bits);
out_err:
kfree(opts.release_agent);
kfree(opts.name);
@@ -3366,13 +3429,6 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
lockdep_set_class(&ss->hierarchy_mutex, &ss->subsys_key);
ss->active = 1;
- /*
- * pin the subsystem's module so it doesn't go away. this shouldn't
- * fail, since the module's initcall calls us.
- * TODO: with module unloading, move this elsewhere
- */
- BUG_ON(!try_module_get(ss->module));
-
/* success! */
mutex_unlock(&cgroup_mutex);
return 0;
@@ -3380,6 +3436,64 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
EXPORT_SYMBOL_GPL(cgroup_load_subsys);
/**
+ * cgroup_unload_subsys: unload a modular subsystem
+ * @ss: the subsystem to unload
+ *
+ * This function should be called in a modular subsystem's exitcall. When this
+ * function is invoked, the refcount on the subsystem's module will be 0, so
+ * the subsystem will not be attached to any hierarchy.
+ */
+void cgroup_unload_subsys(struct cgroup_subsys *ss)
+{
+ struct cg_cgroup_link *link;
+ struct hlist_head *hhead;
+
+ BUG_ON(ss->module == NULL);
+
+ /*
+ * we shouldn't be called if the subsystem is in use, and the use of
+ * try_module_get in parse_cgroupfs_options should ensure that it
+ * doesn't start being used while we're killing it off.
+ */
+ BUG_ON(ss->root != &rootnode);
+
+ mutex_lock(&cgroup_mutex);
+ /* deassign the subsys_id */
+ BUG_ON(ss->subsys_id < CGROUP_BUILTIN_SUBSYS_COUNT);
+ subsys[ss->subsys_id] = NULL;
+
+ /* remove subsystem from rootnode's list of subsystems */
+ list_del(&ss->sibling);
+
+ /*
+ * disentangle the css from all css_sets attached to the dummytop. as
+ * in loading, we need to pay our respects to the hashtable gods.
+ */
+ write_lock(&css_set_lock);
+ list_for_each_entry(link, &dummytop->css_sets, cgrp_link_list) {
+ struct css_set *cg = link->cg;
+
+ hlist_del(&cg->hlist);
+ BUG_ON(!cg->subsys[ss->subsys_id]);
+ cg->subsys[ss->subsys_id] = NULL;
+ hhead = css_set_hash(cg->subsys);
+ hlist_add_head(&cg->hlist, hhead);
+ }
+ write_unlock(&css_set_lock);
+
+ /*
+ * remove subsystem's css from the dummytop and free it - need to free
+ * before marking as null because ss->destroy needs the cgrp->subsys
+ * pointer to find their state.
+ */
+ ss->destroy(ss, dummytop);
+ dummytop->subsys[ss->subsys_id] = NULL;
+
+ mutex_unlock(&cgroup_mutex);
+}
+EXPORT_SYMBOL_GPL(cgroup_unload_subsys);
+
+/**
* cgroup_init_early - cgroup initialization at system boot
*
* Initialize cgroups at system boot, and initialize any
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v4 4/4] cgroups: net_cls as module
2009-12-31 5:10 [PATCH v4 0/4] cgroups: support for module-loadable subsystems Ben Blum
` (2 preceding siblings ...)
2009-12-31 5:14 ` [PATCH v4 3/4] cgroups: subsystem module unloading Ben Blum
@ 2009-12-31 5:15 ` Ben Blum
2010-01-07 0:04 ` [PATCH v4 0/4] cgroups: support for module-loadable subsystems Andrew Morton
4 siblings, 0 replies; 28+ messages in thread
From: Ben Blum @ 2009-12-31 5:15 UTC (permalink / raw)
To: linux-kernel, containers, lizf, akpm, menage, bblum; +Cc: netdev
[-- Attachment #1: cgroups-net_cls-as-module.patch --]
[-- Type: text/plain, Size: 3207 bytes --]
Allows the net_cls cgroup subsystem to be compiled as a module
From: Ben Blum <bblum@andrew.cmu.edu>
This patch modifies net/sched/cls_cgroup.c to allow the net_cls subsystem to
be optionally compiled as a module instead of builtin. The cgroup_subsys
struct is moved around a bit to allow the subsys_id to be either declared as a
compile-time constant by the cgroup_subsys.h include in cgroup.h, or, if it's
a module, initialized within the struct by cgroup_load_subsys.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
---
net/sched/Kconfig | 5 ++++-
net/sched/cls_cgroup.c | 36 +++++++++++++++++++++++++++---------
2 files changed, 31 insertions(+), 10 deletions(-)
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 929218a..08ff9bc 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -328,13 +328,16 @@ config NET_CLS_FLOW
module will be called cls_flow.
config NET_CLS_CGROUP
- bool "Control Group Classifier"
+ tristate "Control Group Classifier"
select NET_CLS
depends on CGROUPS
---help---
Say Y here if you want to classify packets based on the control
cgroup of their process.
+ To compile this code as a module, choose M here: the
+ module will be called cls_cgroup.
+
config NET_EMATCH
bool "Extended Matches"
select NET_CLS
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index e4877ca..7f27d2c 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -24,6 +24,25 @@ struct cgroup_cls_state
u32 classid;
};
+static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
+ struct cgroup *cgrp);
+static void cgrp_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
+static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp);
+
+struct cgroup_subsys net_cls_subsys = {
+ .name = "net_cls",
+ .create = cgrp_create,
+ .destroy = cgrp_destroy,
+ .populate = cgrp_populate,
+#ifdef CONFIG_NET_CLS_CGROUP
+ .subsys_id = net_cls_subsys_id,
+#else
+#define net_cls_subsys_id net_cls_subsys.subsys_id
+#endif
+ .module = THIS_MODULE,
+};
+
+
static inline struct cgroup_cls_state *cgrp_cls_state(struct cgroup *cgrp)
{
return container_of(cgroup_subsys_state(cgrp, net_cls_subsys_id),
@@ -79,14 +98,6 @@ static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
return cgroup_add_files(cgrp, ss, ss_files, ARRAY_SIZE(ss_files));
}
-struct cgroup_subsys net_cls_subsys = {
- .name = "net_cls",
- .create = cgrp_create,
- .destroy = cgrp_destroy,
- .populate = cgrp_populate,
- .subsys_id = net_cls_subsys_id,
-};
-
struct cls_cgroup_head
{
u32 handle;
@@ -277,12 +288,19 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {
static int __init init_cgroup_cls(void)
{
- return register_tcf_proto_ops(&cls_cgroup_ops);
+ int ret = register_tcf_proto_ops(&cls_cgroup_ops);
+ if (ret)
+ return ret;
+ ret = cgroup_load_subsys(&net_cls_subsys);
+ if (ret)
+ unregister_tcf_proto_ops(&cls_cgroup_ops);
+ return ret;
}
static void __exit exit_cgroup_cls(void)
{
unregister_tcf_proto_ops(&cls_cgroup_ops);
+ cgroup_unload_subsys(&net_cls_subsys);
}
module_init(init_cgroup_cls);
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2009-12-31 5:10 [PATCH v4 0/4] cgroups: support for module-loadable subsystems Ben Blum
` (3 preceding siblings ...)
2009-12-31 5:15 ` [PATCH v4 4/4] cgroups: net_cls as module Ben Blum
@ 2010-01-07 0:04 ` Andrew Morton
2010-01-07 1:26 ` Ben Blum
4 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2010-01-07 0:04 UTC (permalink / raw)
To: Ben Blum; +Cc: linux-kernel, containers, lizf, menage
On Thu, 31 Dec 2009 00:10:50 -0500
Ben Blum <bblum@andrew.cmu.edu> wrote:
> This patch series implements support for building, loading, and
> unloading subsystems as modules, both within and outside the kernel
> source tree. It provides an interface cgroup_load_subsys() and
> cgroup_unload_subsys() which modular subsystems can use to register and
> depart during runtime. The net_cls classifier subsystem serves as the
> example for a subsystem which can be converted into a module using these
> changes.
What is the value in this? What are the usage scenarios? Why does the
benefit of this change exceed the cost/risk/etc of merging it?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 0:04 ` [PATCH v4 0/4] cgroups: support for module-loadable subsystems Andrew Morton
@ 2010-01-07 1:26 ` Ben Blum
2010-01-07 3:07 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 28+ messages in thread
From: Ben Blum @ 2010-01-07 1:26 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, containers, lizf, menage
On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> On Thu, 31 Dec 2009 00:10:50 -0500
> Ben Blum <bblum@andrew.cmu.edu> wrote:
>
> > This patch series implements support for building, loading, and
> > unloading subsystems as modules, both within and outside the kernel
> > source tree. It provides an interface cgroup_load_subsys() and
> > cgroup_unload_subsys() which modular subsystems can use to register and
> > depart during runtime. The net_cls classifier subsystem serves as the
> > example for a subsystem which can be converted into a module using these
> > changes.
>
> What is the value in this? What are the usage scenarios? Why does the
> benefit of this change exceed the cost/risk/etc of merging it?
As discussed in the first posting of these patches, this provides the
ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
would have already been a module except for a lack of support from
cgroups, and the change also allows other module-loadable classifiers
to add subsystems of their own.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 1:26 ` Ben Blum
@ 2010-01-07 3:07 ` KAMEZAWA Hiroyuki
2010-01-07 6:42 ` Li Zefan
0 siblings, 1 reply; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-07 3:07 UTC (permalink / raw)
To: Ben Blum; +Cc: Andrew Morton, menage, containers, linux-kernel
On Wed, 6 Jan 2010 20:26:06 -0500
Ben Blum <bblum@andrew.cmu.edu> wrote:
> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > On Thu, 31 Dec 2009 00:10:50 -0500
> > Ben Blum <bblum@andrew.cmu.edu> wrote:
> >
> > > This patch series implements support for building, loading, and
> > > unloading subsystems as modules, both within and outside the kernel
> > > source tree. It provides an interface cgroup_load_subsys() and
> > > cgroup_unload_subsys() which modular subsystems can use to register and
> > > depart during runtime. The net_cls classifier subsystem serves as the
> > > example for a subsystem which can be converted into a module using these
> > > changes.
> >
> > What is the value in this? What are the usage scenarios? Why does the
> > benefit of this change exceed the cost/risk/etc of merging it?
>
> As discussed in the first posting of these patches, this provides the
> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> would have already been a module except for a lack of support from
> cgroups, and the change also allows other module-loadable classifiers
> to add subsystems of their own.
Hmm, do you have your own module in plan ?
Thanks,
-Kame
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 3:07 ` KAMEZAWA Hiroyuki
@ 2010-01-07 6:42 ` Li Zefan
2010-01-07 7:16 ` KAMEZAWA Hiroyuki
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Li Zefan @ 2010-01-07 6:42 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Ben Blum, Andrew Morton, menage, containers, linux-kernel
KAMEZAWA Hiroyuki wrote:
> On Wed, 6 Jan 2010 20:26:06 -0500
> Ben Blum <bblum@andrew.cmu.edu> wrote:
>
>> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
>>> On Thu, 31 Dec 2009 00:10:50 -0500
>>> Ben Blum <bblum@andrew.cmu.edu> wrote:
>>>
>>>> This patch series implements support for building, loading, and
>>>> unloading subsystems as modules, both within and outside the kernel
>>>> source tree. It provides an interface cgroup_load_subsys() and
>>>> cgroup_unload_subsys() which modular subsystems can use to register and
>>>> depart during runtime. The net_cls classifier subsystem serves as the
>>>> example for a subsystem which can be converted into a module using these
>>>> changes.
>>> What is the value in this? What are the usage scenarios? Why does the
>>> benefit of this change exceed the cost/risk/etc of merging it?
>> As discussed in the first posting of these patches, this provides the
>> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
>> would have already been a module except for a lack of support from
>> cgroups, and the change also allows other module-loadable classifiers
>> to add subsystems of their own.
>
> Hmm, do you have your own module in plan ?
>
Maybe the new blkio_cgroup can also be made module-able.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 6:42 ` Li Zefan
@ 2010-01-07 7:16 ` KAMEZAWA Hiroyuki
2010-01-07 7:48 ` Ben Blum
2010-01-07 8:14 ` Ben Blum
2010-01-08 5:27 ` Ben Blum
2 siblings, 1 reply; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-07 7:16 UTC (permalink / raw)
To: Li Zefan; +Cc: Ben Blum, Andrew Morton, menage, containers, linux-kernel
On Thu, 07 Jan 2010 14:42:19 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <bblum@andrew.cmu.edu> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <bblum@andrew.cmu.edu> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.
>
Hmm, I read the patch slightly. I'm not enough expert to review this patch..
I requst following as TODO.
(No objection to the direction/patch.)
1. Add documentation about load/unlod module.
It seems module unloading will not succuess while subsystem is mounted.
Right ?
2. Making this to be reasonable value.
#define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
I can't find why.
3. show whehter a subsys is a loadable module or not via /proc/cgroups
4. how to test ? load/unload NET_CLS is enough ?
Last one is question.
5. Is following path is safe ?
find_css_set() {
....
read_lock(&css_set_lock);
get template including pointer
read_unlock(&css_set_lock);
use template to build new css_set.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 7:16 ` KAMEZAWA Hiroyuki
@ 2010-01-07 7:48 ` Ben Blum
2010-01-07 7:51 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 28+ messages in thread
From: Ben Blum @ 2010-01-07 7:48 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Li Zefan, Andrew Morton, menage, containers, linux-kernel, bblum
On Thu, Jan 07, 2010 at 04:16:27PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 07 Jan 2010 14:42:19 +0800
> Li Zefan <lizf@cn.fujitsu.com> wrote:
>
> > KAMEZAWA Hiroyuki wrote:
> > > On Wed, 6 Jan 2010 20:26:06 -0500
> > > Ben Blum <bblum@andrew.cmu.edu> wrote:
> > >
> > >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > >>> On Thu, 31 Dec 2009 00:10:50 -0500
> > >>> Ben Blum <bblum@andrew.cmu.edu> wrote:
> > >>>
> > >>>> This patch series implements support for building, loading, and
> > >>>> unloading subsystems as modules, both within and outside the kernel
> > >>>> source tree. It provides an interface cgroup_load_subsys() and
> > >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> > >>>> depart during runtime. The net_cls classifier subsystem serves as the
> > >>>> example for a subsystem which can be converted into a module using these
> > >>>> changes.
> > >>> What is the value in this? What are the usage scenarios? Why does the
> > >>> benefit of this change exceed the cost/risk/etc of merging it?
> > >> As discussed in the first posting of these patches, this provides the
> > >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> > >> would have already been a module except for a lack of support from
> > >> cgroups, and the change also allows other module-loadable classifiers
> > >> to add subsystems of their own.
> > >
> > > Hmm, do you have your own module in plan ?
> > >
> >
> > Maybe the new blkio_cgroup can also be made module-able.
> >
>
> Hmm, I read the patch slightly. I'm not enough expert to review this patch..
>
> I requst following as TODO.
> (No objection to the direction/patch.)
>
> 1. Add documentation about load/unlod module.
could perhaps use more, i suppose.
> It seems module unloading will not succuess while subsystem is mounted.
> Right ?
yes, when you mount it, it takes a reference on the module, so you get
"module is in use".
> 2. Making this to be reasonable value.
> #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> I can't find why.
"We limit to this many since cgroupfs_root has subsys_bits to keep track
of all of them." should it be less, perhaps? the memory footprint is not
great, it is true, but implementing dynamically sized subsystem tracking
data structures requires much cleverer code in many places.
> 3. show whehter a subsys is a loadable module or not via /proc/cgroups
with just "y" or "n"? possible, and probably easy. do note that since
they are sorted by subsys_id, all the ones above a certain line will be
"n" and all below will be "y".
> 4. how to test ? load/unload NET_CLS is enough ?
load, mount, remount, unmount, mount with different combinations, unload
was the general approach I took, plus using gdb to examine state.
>
> Last one is question.
>
> 5. Is following path is safe ?
>
> find_css_set() {
> ....
> read_lock(&css_set_lock);
> get template including pointer
> read_unlock(&css_set_lock);
>
> use template to build new css_set.
should be, because that code is dealing with a cgrp's/css's specific
->subsys array, which state is protected by refcounts held by the
already mounted hierarchy, and the other entries in the array are not
cared about by the particular css in question.
>
>
> Thanks,
> -Kame
>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 7:48 ` Ben Blum
@ 2010-01-07 7:51 ` KAMEZAWA Hiroyuki
2010-01-07 8:04 ` Ben Blum
0 siblings, 1 reply; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-07 7:51 UTC (permalink / raw)
To: Ben Blum; +Cc: Li Zefan, Andrew Morton, menage, containers, linux-kernel
On Thu, 7 Jan 2010 02:48:12 -0500
Ben Blum <bblum@andrew.cmu.edu> wrote:
> > 2. Making this to be reasonable value.
> > #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> > I can't find why.
>
> "We limit to this many since cgroupfs_root has subsys_bits to keep track
> of all of them." should it be less, perhaps?
It's ok if it's clear that
"this decistion is done by implementation choice, not by cgroup's nature"
> the memory footprint is not
> great, it is true, but implementing dynamically sized subsystem tracking
> data structures requires much cleverer code in many places.
>
yes. I don't request that.
> > 3. show whehter a subsys is a loadable module or not via /proc/cgroups
>
> with just "y" or "n"? possible, and probably easy. do note that since
> they are sorted by subsys_id, all the ones above a certain line will be
> "n" and all below will be "y".
>
yes.
#subsys_name hierarchy num_cgroups enabled module
cpuset 0 1 1 0
and 0/1 to show y/n ? (but this cause interface incompatibility...)
Thanks,
-Kame
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 7:51 ` KAMEZAWA Hiroyuki
@ 2010-01-07 8:04 ` Ben Blum
0 siblings, 0 replies; 28+ messages in thread
From: Ben Blum @ 2010-01-07 8:04 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Li Zefan, Andrew Morton, menage, containers, linux-kernel
On Thu, Jan 07, 2010 at 04:51:17PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 7 Jan 2010 02:48:12 -0500
> Ben Blum <bblum@andrew.cmu.edu> wrote:
>
> > > 2. Making this to be reasonable value.
> > > #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> > > I can't find why.
> >
> > "We limit to this many since cgroupfs_root has subsys_bits to keep track
> > of all of them." should it be less, perhaps?
>
> It's ok if it's clear that
> "this decistion is done by implementation choice, not by cgroup's nature"
mhm, well, it is the upper limit due to nature, but why it and not a
smaller number is by choice.
>
> > the memory footprint is not
> > great, it is true, but implementing dynamically sized subsystem tracking
> > data structures requires much cleverer code in many places.
> >
> yes. I don't request that.
it might be possible to have a config option as CGROUP_EXTRA_SUBSYSTEMS
(with max being 64 or 32) which would add slots for subsystems outside
of the kernel tree, to avoid using up a lot of blank slots in typical
use cases. not entirely sure how to implement it in the scope of the
configuration world, just speculation.
> > > 3. show whehter a subsys is a loadable module or not via /proc/cgroups
> >
> > with just "y" or "n"? possible, and probably easy. do note that since
> > they are sorted by subsys_id, all the ones above a certain line will be
> > "n" and all below will be "y".
> >
> yes.
>
> #subsys_name hierarchy num_cgroups enabled module
> cpuset 0 1 1 0
>
> and 0/1 to show y/n ? (but this cause interface incompatibility...)
well, format should be agreed upon. 1/0 would be consistent with the
rest of the output.
>
>
> Thanks,
> -Kame
>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 6:42 ` Li Zefan
2010-01-07 7:16 ` KAMEZAWA Hiroyuki
@ 2010-01-07 8:14 ` Ben Blum
2010-01-07 8:22 ` Ben Blum
2010-01-08 5:27 ` Ben Blum
2 siblings, 1 reply; 28+ messages in thread
From: Ben Blum @ 2010-01-07 8:14 UTC (permalink / raw)
To: Li Zefan; +Cc: KAMEZAWA Hiroyuki, Andrew Morton, menage, containers,
linux-kernel
On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <bblum@andrew.cmu.edu> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <bblum@andrew.cmu.edu> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.
certainly looks promising. one thing I note while looking over it is
that it wants .use_id = 1, and the load_subsys interface neglects to
make a call to init_idr. adding calls to cgroup_init_idr has a few
possible complications... what use does blkio have for use_id?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 8:14 ` Ben Blum
@ 2010-01-07 8:22 ` Ben Blum
0 siblings, 0 replies; 28+ messages in thread
From: Ben Blum @ 2010-01-07 8:22 UTC (permalink / raw)
To: Li Zefan
Cc: KAMEZAWA Hiroyuki, Andrew Morton, menage, containers,
linux-kernel, bblum
On Thu, Jan 07, 2010 at 03:14:32AM -0500, Ben Blum wrote:
> On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> > KAMEZAWA Hiroyuki wrote:
> > > On Wed, 6 Jan 2010 20:26:06 -0500
> > > Ben Blum <bblum@andrew.cmu.edu> wrote:
> > >
> > >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > >>> On Thu, 31 Dec 2009 00:10:50 -0500
> > >>> Ben Blum <bblum@andrew.cmu.edu> wrote:
> > >>>
> > >>>> This patch series implements support for building, loading, and
> > >>>> unloading subsystems as modules, both within and outside the kernel
> > >>>> source tree. It provides an interface cgroup_load_subsys() and
> > >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> > >>>> depart during runtime. The net_cls classifier subsystem serves as the
> > >>>> example for a subsystem which can be converted into a module using these
> > >>>> changes.
> > >>> What is the value in this? What are the usage scenarios? Why does the
> > >>> benefit of this change exceed the cost/risk/etc of merging it?
> > >> As discussed in the first posting of these patches, this provides the
> > >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> > >> would have already been a module except for a lack of support from
> > >> cgroups, and the change also allows other module-loadable classifiers
> > >> to add subsystems of their own.
> > >
> > > Hmm, do you have your own module in plan ?
> > >
> >
> > Maybe the new blkio_cgroup can also be made module-able.
>
> certainly looks promising. one thing I note while looking over it is
> that it wants .use_id = 1, and the load_subsys interface neglects to
> make a call to init_idr. adding calls to cgroup_init_idr has a few
> possible complications... what use does blkio have for use_id?
oh, oops, while tracing the calls that init_idr does i ended up looking
at the wrong functions and made it out to be harder than it was (in
several places, even). still it would need to be included in load_subsys
and the single error case handled appropriately.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems
2010-01-07 6:42 ` Li Zefan
2010-01-07 7:16 ` KAMEZAWA Hiroyuki
2010-01-07 8:14 ` Ben Blum
@ 2010-01-08 5:27 ` Ben Blum
2010-01-08 5:29 ` [RFC] [PATCH 1/2] cgroups: modular subsystems support for use_id Ben Blum
2010-01-08 5:30 ` [RFC] [PATCH 2/2] cgroups: blkio subsystem as module Ben Blum
2 siblings, 2 replies; 28+ messages in thread
From: Ben Blum @ 2010-01-08 5:27 UTC (permalink / raw)
To: Li Zefan, axboe, ryov, vgoyal
Cc: KAMEZAWA Hiroyuki, Andrew Morton, menage, containers,
linux-kernel, bblum
On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <bblum@andrew.cmu.edu> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <bblum@andrew.cmu.edu> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.
Ok, the following two patches make this happen (or at least pretend to
well enough to fool me). The first one adds use_id initialization in
cgroup_load_subsys, and the second rearranges config options and some
code as appropriate in block/ and adds EXPORT_SYMBOLs in cgroup.c.
-- bblum
---
block/Kconfig | 2 -
block/Kconfig.iosched | 2 -
block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++-----------
block/blk-cgroup.h | 10 ++++++--
include/linux/iocontext.h | 2 -
kernel/cgroup.c | 31 +++++++++++++++++++++-----
6 files changed, 77 insertions(+), 23 deletions(-)
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC] [PATCH 1/2] cgroups: modular subsystems support for use_id
2010-01-08 5:27 ` Ben Blum
@ 2010-01-08 5:29 ` Ben Blum
2010-01-08 5:30 ` [RFC] [PATCH 2/2] cgroups: blkio subsystem as module Ben Blum
1 sibling, 0 replies; 28+ messages in thread
From: Ben Blum @ 2010-01-08 5:29 UTC (permalink / raw)
To: Li Zefan, KAMEZAWA Hiroyuki, Andrew Morton, menage, containers,
linux-kernel, bblum
[-- Attachment #1: Type: text/plain, Size: 2190 bytes --]
On Fri, Jan 08, 2010 at 12:27:34AM -0500, Ben Blum wrote:
> On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> > KAMEZAWA Hiroyuki wrote:
> > > On Wed, 6 Jan 2010 20:26:06 -0500
> > > Ben Blum <bblum@andrew.cmu.edu> wrote:
> > >
> > >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > >>> On Thu, 31 Dec 2009 00:10:50 -0500
> > >>> Ben Blum <bblum@andrew.cmu.edu> wrote:
> > >>>
> > >>>> This patch series implements support for building, loading, and
> > >>>> unloading subsystems as modules, both within and outside the kernel
> > >>>> source tree. It provides an interface cgroup_load_subsys() and
> > >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> > >>>> depart during runtime. The net_cls classifier subsystem serves as the
> > >>>> example for a subsystem which can be converted into a module using these
> > >>>> changes.
> > >>> What is the value in this? What are the usage scenarios? Why does the
> > >>> benefit of this change exceed the cost/risk/etc of merging it?
> > >> As discussed in the first posting of these patches, this provides the
> > >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> > >> would have already been a module except for a lack of support from
> > >> cgroups, and the change also allows other module-loadable classifiers
> > >> to add subsystems of their own.
> > >
> > > Hmm, do you have your own module in plan ?
> > >
> >
> > Maybe the new blkio_cgroup can also be made module-able.
>
> Ok, the following two patches make this happen (or at least pretend to
> well enough to fool me). The first one adds use_id initialization in
> cgroup_load_subsys, and the second rearranges config options and some
> code as appropriate in block/ and adds EXPORT_SYMBOLs in cgroup.c.
>
> -- bblum
>
> ---
> block/Kconfig | 2 -
> block/Kconfig.iosched | 2 -
> block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++-----------
> block/blk-cgroup.h | 10 ++++++--
> include/linux/iocontext.h | 2 -
> kernel/cgroup.c | 31 +++++++++++++++++++++-----
> 6 files changed, 77 insertions(+), 23 deletions(-)
>
>
[-- Attachment #2: cgroups-module-use_id-support.patch --]
[-- Type: text/plain, Size: 2441 bytes --]
This patch adds support for subsys.use_id in module-loadable subsystems.
From: Ben Blum <bblum@andrew.cmu.edu>
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
---
kernel/cgroup.c | 23 +++++++++++++++++------
1 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index cc2e1f6..5af59eb 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -240,7 +240,8 @@ struct cg_cgroup_link {
static struct css_set init_css_set;
static struct cg_cgroup_link init_css_set_link;
-static int cgroup_subsys_init_idr(struct cgroup_subsys *ss);
+static int cgroup_init_idr(struct cgroup_subsys *ss,
+ struct cgroup_subsys_state *css);
/* css_set_lock protects the list of css_set objects, and the
* chain of tasks off each css_set. Nests outside task->alloc_lock
@@ -3390,6 +3391,16 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
mutex_unlock(&cgroup_mutex);
return PTR_ERR(css);
}
+ /* call init_idr here because it has a failure case */
+ if (ss->use_id) {
+ int ret = cgroup_init_idr(ss, css);
+ if (ret) {
+ ss->destroy(ss, dummytop);
+ subsys[i] = NULL;
+ mutex_unlock(&cgroup_mutex);
+ return ret;
+ }
+ }
list_add(&ss->sibling, &rootnode.subsys_list);
ss->root = &rootnode;
@@ -3484,7 +3495,8 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
/*
* remove subsystem's css from the dummytop and free it - need to free
* before marking as null because ss->destroy needs the cgrp->subsys
- * pointer to find their state.
+ * pointer to find their state. note that this also takes care of
+ * freeing the css_id.
*/
ss->destroy(ss, dummytop);
dummytop->subsys[ss->subsys_id] = NULL;
@@ -3563,7 +3575,7 @@ int __init cgroup_init(void)
if (!ss->early_init)
cgroup_init_subsys(ss);
if (ss->use_id)
- cgroup_subsys_init_idr(ss);
+ cgroup_init_idr(ss, init_css_set.subsys[ss->subsys_id]);
}
/* Add init_css_set to the hash table */
@@ -4231,15 +4243,14 @@ err_out:
}
-static int __init cgroup_subsys_init_idr(struct cgroup_subsys *ss)
+static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
+ struct cgroup_subsys_state *rootcss)
{
struct css_id *newid;
- struct cgroup_subsys_state *rootcss;
spin_lock_init(&ss->id_lock);
idr_init(&ss->idr);
- rootcss = init_css_set.subsys[ss->subsys_id];
newid = get_new_cssid(ss, 0);
if (IS_ERR(newid))
return PTR_ERR(newid);
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC] [PATCH 2/2] cgroups: blkio subsystem as module
2010-01-08 5:27 ` Ben Blum
2010-01-08 5:29 ` [RFC] [PATCH 1/2] cgroups: modular subsystems support for use_id Ben Blum
@ 2010-01-08 5:30 ` Ben Blum
2010-01-08 15:10 ` Vivek Goyal
2010-01-08 16:33 ` Vivek Goyal
1 sibling, 2 replies; 28+ messages in thread
From: Ben Blum @ 2010-01-08 5:30 UTC (permalink / raw)
To: Li Zefan, axboe, ryov, vgoyal, KAMEZAWA Hiroyuki, Andrew Morton,
menage, containers, linux-kernel, bblum
[-- Attachment #1: cgroups-blkio-as-module.patch --]
[-- Type: text/plain, Size: 7371 bytes --]
Convert blk-cgroup to be buildable as a module
From: Ben Blum <bblum@andrew.cmu.edu>
This patch modifies the Block I/O cgroup subsystem to be able to be built as a
module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
enhanced to support the new module dependency.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
---
block/Kconfig | 2 +-
block/Kconfig.iosched | 2 +-
block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
block/blk-cgroup.h | 10 +++++++-
include/linux/iocontext.h | 2 +-
kernel/cgroup.c | 8 +++++++
6 files changed, 60 insertions(+), 17 deletions(-)
diff --git a/block/Kconfig b/block/Kconfig
index e20fbde..62a5921 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
Protection. If in doubt, say N.
config BLK_CGROUP
- bool
+ tristate
depends on CGROUPS
default n
---help---
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index b71abfb..fc71cf0 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -23,6 +23,7 @@ config IOSCHED_DEADLINE
config IOSCHED_CFQ
tristate "CFQ I/O scheduler"
+ select BLK_CGROUP if CFQ_GROUP_IOSCHED
default y
---help---
The CFQ I/O scheduler tries to distribute bandwidth equally
@@ -35,7 +36,6 @@ config IOSCHED_CFQ
config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && CGROUPS
- select BLK_CGROUP
default n
---help---
Enable group IO scheduling in CFQ.
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1fa2654..6c73380 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
EXPORT_SYMBOL_GPL(blkio_root_cgroup);
+static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
+ struct cgroup *);
+static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
+ struct task_struct *, bool);
+static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
+ struct cgroup *, struct task_struct *, bool);
+static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
+static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
+
+struct cgroup_subsys blkio_subsys = {
+ .name = "blkio",
+ .create = blkiocg_create,
+ .can_attach = blkiocg_can_attach,
+ .attach = blkiocg_attach,
+ .destroy = blkiocg_destroy,
+ .populate = blkiocg_populate,
+#ifdef CONFIG_BLK_CGROUP
+ /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
+ .subsys_id = blkio_subsys_id,
+#endif
+ .use_id = 1,
+ .module = THIS_MODULE,
+};
+EXPORT_SYMBOL_GPL(blkio_subsys);
+
bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
{
if (!css_tryget(&blkcg->css))
@@ -267,7 +292,8 @@ remove_entry:
done:
free_css_id(&blkio_subsys, &blkcg->css);
rcu_read_unlock();
- kfree(blkcg);
+ if (blkcg != &blkio_root_cgroup)
+ kfree(blkcg);
}
static struct cgroup_subsys_state *
@@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
task_unlock(tsk);
}
-struct cgroup_subsys blkio_subsys = {
- .name = "blkio",
- .create = blkiocg_create,
- .can_attach = blkiocg_can_attach,
- .attach = blkiocg_attach,
- .destroy = blkiocg_destroy,
- .populate = blkiocg_populate,
- .subsys_id = blkio_subsys_id,
- .use_id = 1,
-};
-
void blkio_policy_register(struct blkio_policy_type *blkiop)
{
spin_lock(&blkio_list_lock);
@@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
spin_unlock(&blkio_list_lock);
}
EXPORT_SYMBOL_GPL(blkio_policy_unregister);
+
+static int __init init_cgroup_blkio(void)
+{
+ return cgroup_load_subsys(&blkio_subsys);
+}
+
+static void __exit exit_cgroup_blkio(void)
+{
+ cgroup_unload_subsys(&blkio_subsys);
+}
+
+module_init(init_cgroup_blkio);
+module_exit(exit_cgroup_blkio);
+MODULE_LICENSE("GPL");
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 4d316df..57648c6 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -15,7 +15,13 @@
#include <linux/cgroup.h>
-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
+
+#ifndef CONFIG_BLK_CGROUP
+/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
+extern struct cgroup_subsys blkio_subsys;
+#define blkio_subsys_id blkio_subsys.subsys_id
+#endif
struct blkio_cgroup {
struct cgroup_subsys_state css;
@@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
struct blkio_group *blkg, unsigned long dequeue) {}
#endif
-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
extern struct blkio_cgroup blkio_root_cgroup;
extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index a632359..b9f109d 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -68,7 +68,7 @@ struct io_context {
unsigned short ioprio;
unsigned short ioprio_changed;
-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
unsigned short cgroup_changed;
#endif
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5af59eb..391ff41 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -690,6 +690,7 @@ void cgroup_lock(void)
{
mutex_lock(&cgroup_mutex);
}
+EXPORT_SYMBOL_GPL(cgroup_lock);
/**
* cgroup_unlock - release lock on cgroup changes
@@ -700,6 +701,7 @@ void cgroup_unlock(void)
{
mutex_unlock(&cgroup_mutex);
}
+EXPORT_SYMBOL_GPL(cgroup_unlock);
/*
* A couple of forward declarations required, due to cyclic reference loop:
@@ -1762,6 +1764,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
}
return true;
}
+EXPORT_SYMBOL_GPL(cgroup_lock_live_group);
static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
const char *buffer)
@@ -4034,6 +4037,7 @@ void __css_put(struct cgroup_subsys_state *css)
rcu_read_unlock();
WARN_ON_ONCE(val < 1);
}
+EXPORT_SYMBOL_GPL(__css_put);
/*
* Notify userspace when a cgroup is released, by running the
@@ -4149,6 +4153,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
return cssid->id;
return 0;
}
+EXPORT_SYMBOL_GPL(css_id);
unsigned short css_depth(struct cgroup_subsys_state *css)
{
@@ -4158,6 +4163,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
return cssid->depth;
return 0;
}
+EXPORT_SYMBOL_GPL(css_depth);
bool css_is_ancestor(struct cgroup_subsys_state *child,
const struct cgroup_subsys_state *root)
@@ -4194,6 +4200,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
spin_unlock(&ss->id_lock);
call_rcu(&id->rcu_head, __free_css_id_cb);
}
+EXPORT_SYMBOL_GPL(free_css_id);
/*
* This is called by init or create(). Then, calls to this function are
@@ -4310,6 +4317,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
return rcu_dereference(cssid->css);
}
+EXPORT_SYMBOL_GPL(css_lookup);
/**
* css_get_next - lookup next cgroup under specified hierarchy.
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module
2010-01-08 5:30 ` [RFC] [PATCH 2/2] cgroups: blkio subsystem as module Ben Blum
@ 2010-01-08 15:10 ` Vivek Goyal
2010-01-12 0:21 ` KAMEZAWA Hiroyuki
2010-01-08 16:33 ` Vivek Goyal
1 sibling, 1 reply; 28+ messages in thread
From: Vivek Goyal @ 2010-01-08 15:10 UTC (permalink / raw)
To: Li Zefan, axboe, ryov, KAMEZAWA Hiroyuki, Andrew Morton, menage,
containers, linux-kernel
On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <bblum@andrew.cmu.edu>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.
>
Hi Ben,
I will give this patch a try.
So from blk-cgroup perspective, the advantage of allowing it as module
will be that we can save some memory if we are not using the controller?
> Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
> ---
> block/Kconfig | 2 +-
> block/Kconfig.iosched | 2 +-
> block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
> block/blk-cgroup.h | 10 +++++++-
> include/linux/iocontext.h | 2 +-
> kernel/cgroup.c | 8 +++++++
> 6 files changed, 60 insertions(+), 17 deletions(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index e20fbde..62a5921 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
> Protection. If in doubt, say N.
>
> config BLK_CGROUP
> - bool
> + tristate
> depends on CGROUPS
> default n
> ---help---
> diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
> index b71abfb..fc71cf0 100644
> --- a/block/Kconfig.iosched
> +++ b/block/Kconfig.iosched
> @@ -23,6 +23,7 @@ config IOSCHED_DEADLINE
>
> config IOSCHED_CFQ
> tristate "CFQ I/O scheduler"
> + select BLK_CGROUP if CFQ_GROUP_IOSCHED
> default y
> ---help---
> The CFQ I/O scheduler tries to distribute bandwidth equally
> @@ -35,7 +36,6 @@ config IOSCHED_CFQ
> config CFQ_GROUP_IOSCHED
> bool "CFQ Group Scheduling support"
> depends on IOSCHED_CFQ && CGROUPS
> - select BLK_CGROUP
> default n
> ---help---
> Enable group IO scheduling in CFQ.
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 1fa2654..6c73380 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
> struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
> EXPORT_SYMBOL_GPL(blkio_root_cgroup);
>
> +static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
> + struct cgroup *);
> +static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
> + struct task_struct *, bool);
> +static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
> + struct cgroup *, struct task_struct *, bool);
> +static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
> +static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
> +
> +struct cgroup_subsys blkio_subsys = {
> + .name = "blkio",
> + .create = blkiocg_create,
> + .can_attach = blkiocg_can_attach,
> + .attach = blkiocg_attach,
> + .destroy = blkiocg_destroy,
> + .populate = blkiocg_populate,
> +#ifdef CONFIG_BLK_CGROUP
> + /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
> + .subsys_id = blkio_subsys_id,
> +#endif
> + .use_id = 1,
> + .module = THIS_MODULE,
> +};
> +EXPORT_SYMBOL_GPL(blkio_subsys);
Why are you moving blkio_subsys declaration up in the file. That way now
you have to declare all the functions before you can instanciate
blkio_subsys?
> +
> bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
> {
> if (!css_tryget(&blkcg->css))
> @@ -267,7 +292,8 @@ remove_entry:
> done:
> free_css_id(&blkio_subsys, &blkcg->css);
> rcu_read_unlock();
> - kfree(blkcg);
> + if (blkcg != &blkio_root_cgroup)
> + kfree(blkcg);
> }
>
> static struct cgroup_subsys_state *
> @@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
> task_unlock(tsk);
> }
>
> -struct cgroup_subsys blkio_subsys = {
> - .name = "blkio",
> - .create = blkiocg_create,
> - .can_attach = blkiocg_can_attach,
> - .attach = blkiocg_attach,
> - .destroy = blkiocg_destroy,
> - .populate = blkiocg_populate,
> - .subsys_id = blkio_subsys_id,
> - .use_id = 1,
> -};
> -
> void blkio_policy_register(struct blkio_policy_type *blkiop)
> {
> spin_lock(&blkio_list_lock);
> @@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
> spin_unlock(&blkio_list_lock);
> }
> EXPORT_SYMBOL_GPL(blkio_policy_unregister);
> +
> +static int __init init_cgroup_blkio(void)
> +{
> + return cgroup_load_subsys(&blkio_subsys);
> +}
> +
> +static void __exit exit_cgroup_blkio(void)
> +{
> + cgroup_unload_subsys(&blkio_subsys);
> +}
> +
> +module_init(init_cgroup_blkio);
> +module_exit(exit_cgroup_blkio);
> +MODULE_LICENSE("GPL");
> diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
> index 4d316df..57648c6 100644
> --- a/block/blk-cgroup.h
> +++ b/block/blk-cgroup.h
> @@ -15,7 +15,13 @@
>
> #include <linux/cgroup.h>
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> +
> +#ifndef CONFIG_BLK_CGROUP
> +/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
> +extern struct cgroup_subsys blkio_subsys;
> +#define blkio_subsys_id blkio_subsys.subsys_id
> +#endif
>
Could above if conditions be simplified to just check for BLK_CGROUP_MODULE.
#ifdef CONFIG_BLK_CGROUP_MODULE
extern struct cgroup_subsys blkio_subsys;
#define blkio_subsys_id blkio_subsys.subsys_id
#endif
Thanks
Vivek
> struct blkio_cgroup {
> struct cgroup_subsys_state css;
> @@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
> struct blkio_group *blkg, unsigned long dequeue) {}
> #endif
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> extern struct blkio_cgroup blkio_root_cgroup;
> extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
> extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
> diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
> index a632359..b9f109d 100644
> --- a/include/linux/iocontext.h
> +++ b/include/linux/iocontext.h
> @@ -68,7 +68,7 @@ struct io_context {
> unsigned short ioprio;
> unsigned short ioprio_changed;
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> unsigned short cgroup_changed;
> #endif
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 5af59eb..391ff41 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -690,6 +690,7 @@ void cgroup_lock(void)
> {
> mutex_lock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock);
>
> /**
> * cgroup_unlock - release lock on cgroup changes
> @@ -700,6 +701,7 @@ void cgroup_unlock(void)
> {
> mutex_unlock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_unlock);
>
> /*
> * A couple of forward declarations required, due to cyclic reference loop:
> @@ -1762,6 +1764,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
> }
> return true;
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock_live_group);
>
> static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
> const char *buffer)
> @@ -4034,6 +4037,7 @@ void __css_put(struct cgroup_subsys_state *css)
> rcu_read_unlock();
> WARN_ON_ONCE(val < 1);
> }
> +EXPORT_SYMBOL_GPL(__css_put);
>
> /*
> * Notify userspace when a cgroup is released, by running the
> @@ -4149,6 +4153,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
> return cssid->id;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_id);
>
> unsigned short css_depth(struct cgroup_subsys_state *css)
> {
> @@ -4158,6 +4163,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
> return cssid->depth;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_depth);
>
> bool css_is_ancestor(struct cgroup_subsys_state *child,
> const struct cgroup_subsys_state *root)
> @@ -4194,6 +4200,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
> spin_unlock(&ss->id_lock);
> call_rcu(&id->rcu_head, __free_css_id_cb);
> }
> +EXPORT_SYMBOL_GPL(free_css_id);
>
> /*
> * This is called by init or create(). Then, calls to this function are
> @@ -4310,6 +4317,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
>
> return rcu_dereference(cssid->css);
> }
> +EXPORT_SYMBOL_GPL(css_lookup);
>
> /**
> * css_get_next - lookup next cgroup under specified hierarchy.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module
2010-01-08 5:30 ` [RFC] [PATCH 2/2] cgroups: blkio subsystem as module Ben Blum
2010-01-08 15:10 ` Vivek Goyal
@ 2010-01-08 16:33 ` Vivek Goyal
2010-01-12 23:34 ` Ben Blum
1 sibling, 1 reply; 28+ messages in thread
From: Vivek Goyal @ 2010-01-08 16:33 UTC (permalink / raw)
To: Li Zefan, axboe, ryov, KAMEZAWA Hiroyuki, Andrew Morton, menage,
containers, linux-kernel
On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <bblum@andrew.cmu.edu>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.
>
> Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Two quick observations with testing.
You need to EXPORT cgroup_path.
Second, after loading the module, I mounted the blkio controller. But creating
a cgroup directory crashed.
Vivek
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
PGD 1a7fe0067 PUD 1a0f19067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 3
Pid: 3984, comm: mkdir Not tainted 2.6.33-rc3-cgroup-module #2 /ProLiant DL380 G5
RIP: 0010:[<ffffffff8106bcb4>] [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
RSP: 0018:ffff8801a0f17e38 EFLAGS: 00010203
RAX: ffff8801a8cbde00 RBX: ffffffffa02cc8e0 RCX: ffff8801a8cbde00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff8801a770c000 R08: 0000000000000040 R09: ffff8801a0f17d48
R10: ffff8801a0f17ec8 R11: ffffffff8113efb4 R12: 0000000000000001
R13: ffff8801a0f22000 R14: ffff8801948aa980 R15: ffff8801a0f22030
FS: 00007ff9ecf5a710(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 00000001a257e000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mkdir (pid: 3984, threadinfo ffff8801a0f16000, task ffff8801a8e2c040)
Stack:
0000000000000003 000001ed948a8db0 ffff8801a0e1e800 0000000000000000
<0> ffff8801a7745940 ffff8801948a8db0 ffff8801948aa980 0000000000000000
<0> 00000000000001ed 00007fff661cb6eb 0000000000000000 ffffffff810d837e
Call Trace:
[<ffffffff810d837e>] ? vfs_mkdir+0xd5/0x164
[<ffffffff810da743>] ? sys_mkdirat+0x85/0xd1
[<ffffffff810773be>] ? audit_syscall_entry+0x1b9/0x1e4
[<ffffffff8100286b>] ? system_call_fastpath+0x16/0x1b
Code: 4c 24 18 e8 82 f3 ff ff 31 f6 48 3d 00 f0 ff ff 48 89 c1 76 20 85 c0 74 35 45 31 e4 e9 5a 01 00 00 48 8b 7c 24 18 48 63 d6 ff c6 <66> 8b 44 57 20 66 89 44 51 20 44 39 e6 7c e7 8b 51 08 49 63 c4
RIP [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
RSP <ffff8801a0f17e38>
CR2: 0000000000000020
---[ end trace 09377bc5e5c2b563 ]---
> ---
> block/Kconfig | 2 +-
> block/Kconfig.iosched | 2 +-
> block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
> block/blk-cgroup.h | 10 +++++++-
> include/linux/iocontext.h | 2 +-
> kernel/cgroup.c | 8 +++++++
> 6 files changed, 60 insertions(+), 17 deletions(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index e20fbde..62a5921 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
> Protection. If in doubt, say N.
>
> config BLK_CGROUP
> - bool
> + tristate
> depends on CGROUPS
> default n
> ---help---
> diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
> index b71abfb..fc71cf0 100644
> --- a/block/Kconfig.iosched
> +++ b/block/Kconfig.iosched
> @@ -23,6 +23,7 @@ config IOSCHED_DEADLINE
>
> config IOSCHED_CFQ
> tristate "CFQ I/O scheduler"
> + select BLK_CGROUP if CFQ_GROUP_IOSCHED
> default y
> ---help---
> The CFQ I/O scheduler tries to distribute bandwidth equally
> @@ -35,7 +36,6 @@ config IOSCHED_CFQ
> config CFQ_GROUP_IOSCHED
> bool "CFQ Group Scheduling support"
> depends on IOSCHED_CFQ && CGROUPS
> - select BLK_CGROUP
> default n
> ---help---
> Enable group IO scheduling in CFQ.
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 1fa2654..6c73380 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
> struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
> EXPORT_SYMBOL_GPL(blkio_root_cgroup);
>
> +static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
> + struct cgroup *);
> +static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
> + struct task_struct *, bool);
> +static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
> + struct cgroup *, struct task_struct *, bool);
> +static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
> +static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
> +
> +struct cgroup_subsys blkio_subsys = {
> + .name = "blkio",
> + .create = blkiocg_create,
> + .can_attach = blkiocg_can_attach,
> + .attach = blkiocg_attach,
> + .destroy = blkiocg_destroy,
> + .populate = blkiocg_populate,
> +#ifdef CONFIG_BLK_CGROUP
> + /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
> + .subsys_id = blkio_subsys_id,
> +#endif
> + .use_id = 1,
> + .module = THIS_MODULE,
> +};
> +EXPORT_SYMBOL_GPL(blkio_subsys);
> +
> bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
> {
> if (!css_tryget(&blkcg->css))
> @@ -267,7 +292,8 @@ remove_entry:
> done:
> free_css_id(&blkio_subsys, &blkcg->css);
> rcu_read_unlock();
> - kfree(blkcg);
> + if (blkcg != &blkio_root_cgroup)
> + kfree(blkcg);
> }
>
> static struct cgroup_subsys_state *
> @@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
> task_unlock(tsk);
> }
>
> -struct cgroup_subsys blkio_subsys = {
> - .name = "blkio",
> - .create = blkiocg_create,
> - .can_attach = blkiocg_can_attach,
> - .attach = blkiocg_attach,
> - .destroy = blkiocg_destroy,
> - .populate = blkiocg_populate,
> - .subsys_id = blkio_subsys_id,
> - .use_id = 1,
> -};
> -
> void blkio_policy_register(struct blkio_policy_type *blkiop)
> {
> spin_lock(&blkio_list_lock);
> @@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
> spin_unlock(&blkio_list_lock);
> }
> EXPORT_SYMBOL_GPL(blkio_policy_unregister);
> +
> +static int __init init_cgroup_blkio(void)
> +{
> + return cgroup_load_subsys(&blkio_subsys);
> +}
> +
> +static void __exit exit_cgroup_blkio(void)
> +{
> + cgroup_unload_subsys(&blkio_subsys);
> +}
> +
> +module_init(init_cgroup_blkio);
> +module_exit(exit_cgroup_blkio);
> +MODULE_LICENSE("GPL");
> diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
> index 4d316df..57648c6 100644
> --- a/block/blk-cgroup.h
> +++ b/block/blk-cgroup.h
> @@ -15,7 +15,13 @@
>
> #include <linux/cgroup.h>
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> +
> +#ifndef CONFIG_BLK_CGROUP
> +/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
> +extern struct cgroup_subsys blkio_subsys;
> +#define blkio_subsys_id blkio_subsys.subsys_id
> +#endif
>
> struct blkio_cgroup {
> struct cgroup_subsys_state css;
> @@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
> struct blkio_group *blkg, unsigned long dequeue) {}
> #endif
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> extern struct blkio_cgroup blkio_root_cgroup;
> extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
> extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
> diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
> index a632359..b9f109d 100644
> --- a/include/linux/iocontext.h
> +++ b/include/linux/iocontext.h
> @@ -68,7 +68,7 @@ struct io_context {
> unsigned short ioprio;
> unsigned short ioprio_changed;
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> unsigned short cgroup_changed;
> #endif
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 5af59eb..391ff41 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -690,6 +690,7 @@ void cgroup_lock(void)
> {
> mutex_lock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock);
>
> /**
> * cgroup_unlock - release lock on cgroup changes
> @@ -700,6 +701,7 @@ void cgroup_unlock(void)
> {
> mutex_unlock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_unlock);
>
> /*
> * A couple of forward declarations required, due to cyclic reference loop:
> @@ -1762,6 +1764,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
> }
> return true;
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock_live_group);
>
> static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
> const char *buffer)
> @@ -4034,6 +4037,7 @@ void __css_put(struct cgroup_subsys_state *css)
> rcu_read_unlock();
> WARN_ON_ONCE(val < 1);
> }
> +EXPORT_SYMBOL_GPL(__css_put);
>
> /*
> * Notify userspace when a cgroup is released, by running the
> @@ -4149,6 +4153,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
> return cssid->id;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_id);
>
> unsigned short css_depth(struct cgroup_subsys_state *css)
> {
> @@ -4158,6 +4163,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
> return cssid->depth;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_depth);
>
> bool css_is_ancestor(struct cgroup_subsys_state *child,
> const struct cgroup_subsys_state *root)
> @@ -4194,6 +4200,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
> spin_unlock(&ss->id_lock);
> call_rcu(&id->rcu_head, __free_css_id_cb);
> }
> +EXPORT_SYMBOL_GPL(free_css_id);
>
> /*
> * This is called by init or create(). Then, calls to this function are
> @@ -4310,6 +4317,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
>
> return rcu_dereference(cssid->css);
> }
> +EXPORT_SYMBOL_GPL(css_lookup);
>
> /**
> * css_get_next - lookup next cgroup under specified hierarchy.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module
2010-01-08 15:10 ` Vivek Goyal
@ 2010-01-12 0:21 ` KAMEZAWA Hiroyuki
2010-01-14 9:32 ` Balbir Singh
0 siblings, 1 reply; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-12 0:21 UTC (permalink / raw)
To: Vivek Goyal
Cc: Li Zefan, axboe, ryov, Andrew Morton, menage, containers,
linux-kernel
On Fri, 8 Jan 2010 10:10:38 -0500
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> > Convert blk-cgroup to be buildable as a module
> >
> > From: Ben Blum <bblum@andrew.cmu.edu>
> >
> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> > enhanced to support the new module dependency.
> >
>
> Hi Ben,
>
> I will give this patch a try.
>
> So from blk-cgroup perspective, the advantage of allowing it as module
> will be that we can save some memory if we are not using the controller?
>
Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
introduced ?
Thanks,
-Kame
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module
2010-01-08 16:33 ` Vivek Goyal
@ 2010-01-12 23:34 ` Ben Blum
2010-01-12 23:36 ` [PATCH v2 1/2] cgroups: modular subsystems support for use_id Ben Blum
2010-01-12 23:37 ` [PATCH v2 2/2] cgroups: blkio subsystem as module Ben Blum
0 siblings, 2 replies; 28+ messages in thread
From: Ben Blum @ 2010-01-12 23:34 UTC (permalink / raw)
To: Vivek Goyal
Cc: Li Zefan, axboe, ryov, KAMEZAWA Hiroyuki, Andrew Morton, menage,
containers, linux-kernel, bblum
On Fri, Jan 08, 2010 at 11:33:52AM -0500, Vivek Goyal wrote:
> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> > Convert blk-cgroup to be buildable as a module
> >
> > From: Ben Blum <bblum@andrew.cmu.edu>
> >
> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> > enhanced to support the new module dependency.
> >
> > Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
>
> Two quick observations with testing.
>
> You need to EXPORT cgroup_path.
>
> Second, after loading the module, I mounted the blkio controller. But creating
> a cgroup directory crashed.
>
> Vivek
argh, good catches on both of them. didn't test with DEBUG_CFQ_IOSCHED
(for cgroup_path) or with making a sub-cgroup (for the crash); shame on
me. turns out it crashed because I had init_idr before init_css_set, and
init_css_set sets css->id = NULL explicitly. fixed patches forthcoming.
-- bblum
---
block/Kconfig | 2 -
block/Kconfig.iosched | 2 -
block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++-----------
block/blk-cgroup.h | 10 ++++++--
include/linux/iocontext.h | 2 -
kernel/cgroup.c | 34 ++++++++++++++++++++++++-----
6 files changed, 80 insertions(+), 23 deletions(-)
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 1/2] cgroups: modular subsystems support for use_id
2010-01-12 23:34 ` Ben Blum
@ 2010-01-12 23:36 ` Ben Blum
2010-01-12 23:37 ` [PATCH v2 2/2] cgroups: blkio subsystem as module Ben Blum
1 sibling, 0 replies; 28+ messages in thread
From: Ben Blum @ 2010-01-12 23:36 UTC (permalink / raw)
To: Ben Blum
Cc: Vivek Goyal, Li Zefan, axboe, ryov, KAMEZAWA Hiroyuki,
Andrew Morton, menage, containers, linux-kernel
[-- Attachment #1: cgroups-module-use_id-support.patch --]
[-- Type: text/plain, Size: 2626 bytes --]
This patch adds support for subsys.use_id in module-loadable subsystems.
From: Ben Blum <bblum@andrew.cmu.edu>
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
---
kernel/cgroup.c | 25 +++++++++++++++++++------
1 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index cc2e1f6..b4ae6ef 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -240,7 +240,8 @@ struct cg_cgroup_link {
static struct css_set init_css_set;
static struct cg_cgroup_link init_css_set_link;
-static int cgroup_subsys_init_idr(struct cgroup_subsys *ss);
+static int cgroup_init_idr(struct cgroup_subsys *ss,
+ struct cgroup_subsys_state *css);
/* css_set_lock protects the list of css_set objects, and the
* chain of tasks off each css_set. Nests outside task->alloc_lock
@@ -3396,6 +3397,18 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
/* our new subsystem will be attached to the dummy hierarchy. */
init_cgroup_css(css, ss, dummytop);
+ /* init_idr must be after init_cgroup_css because it sets css->id. */
+ if (ss->use_id) {
+ int ret = cgroup_init_idr(ss, css);
+ if (ret) {
+ dummytop->subsys[ss->subsys_id] = NULL;
+ ss->destroy(ss, dummytop);
+ subsys[i] = NULL;
+ mutex_unlock(&cgroup_mutex);
+ return ret;
+ }
+ }
+
/*
* Now we need to entangle the css into the existing css_sets. unlike
* in cgroup_init_subsys, there are now multiple css_sets, so each one
@@ -3484,7 +3497,8 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
/*
* remove subsystem's css from the dummytop and free it - need to free
* before marking as null because ss->destroy needs the cgrp->subsys
- * pointer to find their state.
+ * pointer to find their state. note that this also takes care of
+ * freeing the css_id.
*/
ss->destroy(ss, dummytop);
dummytop->subsys[ss->subsys_id] = NULL;
@@ -3563,7 +3577,7 @@ int __init cgroup_init(void)
if (!ss->early_init)
cgroup_init_subsys(ss);
if (ss->use_id)
- cgroup_subsys_init_idr(ss);
+ cgroup_init_idr(ss, init_css_set.subsys[ss->subsys_id]);
}
/* Add init_css_set to the hash table */
@@ -4231,15 +4245,14 @@ err_out:
}
-static int __init cgroup_subsys_init_idr(struct cgroup_subsys *ss)
+static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
+ struct cgroup_subsys_state *rootcss)
{
struct css_id *newid;
- struct cgroup_subsys_state *rootcss;
spin_lock_init(&ss->id_lock);
idr_init(&ss->idr);
- rootcss = init_css_set.subsys[ss->subsys_id];
newid = get_new_cssid(ss, 0);
if (IS_ERR(newid))
return PTR_ERR(newid);
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 2/2] cgroups: blkio subsystem as module
2010-01-12 23:34 ` Ben Blum
2010-01-12 23:36 ` [PATCH v2 1/2] cgroups: modular subsystems support for use_id Ben Blum
@ 2010-01-12 23:37 ` Ben Blum
2010-01-14 9:15 ` Jens Axboe
1 sibling, 1 reply; 28+ messages in thread
From: Ben Blum @ 2010-01-12 23:37 UTC (permalink / raw)
To: Ben Blum
Cc: Vivek Goyal, Li Zefan, axboe, ryov, KAMEZAWA Hiroyuki,
Andrew Morton, menage, containers, linux-kernel
[-- Attachment #1: cgroups-blkio-as-module.patch --]
[-- Type: text/plain, Size: 7620 bytes --]
Convert blk-cgroup to be buildable as a module
From: Ben Blum <bblum@andrew.cmu.edu>
This patch modifies the Block I/O cgroup subsystem to be able to be built as a
module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
enhanced to support the new module dependency.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
---
block/Kconfig | 2 +-
block/Kconfig.iosched | 2 +-
block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
block/blk-cgroup.h | 10 +++++++-
include/linux/iocontext.h | 2 +-
kernel/cgroup.c | 9 ++++++++
6 files changed, 61 insertions(+), 17 deletions(-)
diff --git a/block/Kconfig b/block/Kconfig
index e20fbde..62a5921 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
Protection. If in doubt, say N.
config BLK_CGROUP
- bool
+ tristate
depends on CGROUPS
default n
---help---
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index b71abfb..fc71cf0 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -23,6 +23,7 @@ config IOSCHED_DEADLINE
config IOSCHED_CFQ
tristate "CFQ I/O scheduler"
+ select BLK_CGROUP if CFQ_GROUP_IOSCHED
default y
---help---
The CFQ I/O scheduler tries to distribute bandwidth equally
@@ -35,7 +36,6 @@ config IOSCHED_CFQ
config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && CGROUPS
- select BLK_CGROUP
default n
---help---
Enable group IO scheduling in CFQ.
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1fa2654..6c73380 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
EXPORT_SYMBOL_GPL(blkio_root_cgroup);
+static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
+ struct cgroup *);
+static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
+ struct task_struct *, bool);
+static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
+ struct cgroup *, struct task_struct *, bool);
+static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
+static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
+
+struct cgroup_subsys blkio_subsys = {
+ .name = "blkio",
+ .create = blkiocg_create,
+ .can_attach = blkiocg_can_attach,
+ .attach = blkiocg_attach,
+ .destroy = blkiocg_destroy,
+ .populate = blkiocg_populate,
+#ifdef CONFIG_BLK_CGROUP
+ /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
+ .subsys_id = blkio_subsys_id,
+#endif
+ .use_id = 1,
+ .module = THIS_MODULE,
+};
+EXPORT_SYMBOL_GPL(blkio_subsys);
+
bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
{
if (!css_tryget(&blkcg->css))
@@ -267,7 +292,8 @@ remove_entry:
done:
free_css_id(&blkio_subsys, &blkcg->css);
rcu_read_unlock();
- kfree(blkcg);
+ if (blkcg != &blkio_root_cgroup)
+ kfree(blkcg);
}
static struct cgroup_subsys_state *
@@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
task_unlock(tsk);
}
-struct cgroup_subsys blkio_subsys = {
- .name = "blkio",
- .create = blkiocg_create,
- .can_attach = blkiocg_can_attach,
- .attach = blkiocg_attach,
- .destroy = blkiocg_destroy,
- .populate = blkiocg_populate,
- .subsys_id = blkio_subsys_id,
- .use_id = 1,
-};
-
void blkio_policy_register(struct blkio_policy_type *blkiop)
{
spin_lock(&blkio_list_lock);
@@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
spin_unlock(&blkio_list_lock);
}
EXPORT_SYMBOL_GPL(blkio_policy_unregister);
+
+static int __init init_cgroup_blkio(void)
+{
+ return cgroup_load_subsys(&blkio_subsys);
+}
+
+static void __exit exit_cgroup_blkio(void)
+{
+ cgroup_unload_subsys(&blkio_subsys);
+}
+
+module_init(init_cgroup_blkio);
+module_exit(exit_cgroup_blkio);
+MODULE_LICENSE("GPL");
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 4d316df..57648c6 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -15,7 +15,13 @@
#include <linux/cgroup.h>
-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
+
+#ifndef CONFIG_BLK_CGROUP
+/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
+extern struct cgroup_subsys blkio_subsys;
+#define blkio_subsys_id blkio_subsys.subsys_id
+#endif
struct blkio_cgroup {
struct cgroup_subsys_state css;
@@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
struct blkio_group *blkg, unsigned long dequeue) {}
#endif
-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
extern struct blkio_cgroup blkio_root_cgroup;
extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index a632359..b9f109d 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -68,7 +68,7 @@ struct io_context {
unsigned short ioprio;
unsigned short ioprio_changed;
-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
unsigned short cgroup_changed;
#endif
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b4ae6ef..845a2e7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -690,6 +690,7 @@ void cgroup_lock(void)
{
mutex_lock(&cgroup_mutex);
}
+EXPORT_SYMBOL_GPL(cgroup_lock);
/**
* cgroup_unlock - release lock on cgroup changes
@@ -700,6 +701,7 @@ void cgroup_unlock(void)
{
mutex_unlock(&cgroup_mutex);
}
+EXPORT_SYMBOL_GPL(cgroup_unlock);
/*
* A couple of forward declarations required, due to cyclic reference loop:
@@ -1622,6 +1624,7 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
memmove(buf, start, buf + buflen - start);
return 0;
}
+EXPORT_SYMBOL_GPL(cgroup_path);
/**
* cgroup_attach_task - attach task 'tsk' to cgroup 'cgrp'
@@ -1762,6 +1765,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
}
return true;
}
+EXPORT_SYMBOL_GPL(cgroup_lock_live_group);
static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
const char *buffer)
@@ -4036,6 +4040,7 @@ void __css_put(struct cgroup_subsys_state *css)
rcu_read_unlock();
WARN_ON_ONCE(val < 1);
}
+EXPORT_SYMBOL_GPL(__css_put);
/*
* Notify userspace when a cgroup is released, by running the
@@ -4151,6 +4156,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
return cssid->id;
return 0;
}
+EXPORT_SYMBOL_GPL(css_id);
unsigned short css_depth(struct cgroup_subsys_state *css)
{
@@ -4160,6 +4166,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
return cssid->depth;
return 0;
}
+EXPORT_SYMBOL_GPL(css_depth);
bool css_is_ancestor(struct cgroup_subsys_state *child,
const struct cgroup_subsys_state *root)
@@ -4196,6 +4203,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
spin_unlock(&ss->id_lock);
call_rcu(&id->rcu_head, __free_css_id_cb);
}
+EXPORT_SYMBOL_GPL(free_css_id);
/*
* This is called by init or create(). Then, calls to this function are
@@ -4312,6 +4320,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
return rcu_dereference(cssid->css);
}
+EXPORT_SYMBOL_GPL(css_lookup);
/**
* css_get_next - lookup next cgroup under specified hierarchy.
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/2] cgroups: blkio subsystem as module
2010-01-12 23:37 ` [PATCH v2 2/2] cgroups: blkio subsystem as module Ben Blum
@ 2010-01-14 9:15 ` Jens Axboe
2010-01-14 9:29 ` Li Zefan
0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2010-01-14 9:15 UTC (permalink / raw)
To: Ben Blum
Cc: Vivek Goyal, Li Zefan, ryov, KAMEZAWA Hiroyuki, Andrew Morton,
menage, containers, linux-kernel
On Tue, Jan 12 2010, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <bblum@andrew.cmu.edu>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.
I'd like to add this for 2.6.34, but I'd like some sign-off(s) on the
cgroup side of things (for patch 1/2).
--
Jens Axboe
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/2] cgroups: blkio subsystem as module
2010-01-14 9:15 ` Jens Axboe
@ 2010-01-14 9:29 ` Li Zefan
0 siblings, 0 replies; 28+ messages in thread
From: Li Zefan @ 2010-01-14 9:29 UTC (permalink / raw)
To: Jens Axboe
Cc: Ben Blum, Vivek Goyal, ryov, KAMEZAWA Hiroyuki, Andrew Morton,
menage, containers, linux-kernel
Jens Axboe wrote:
> On Tue, Jan 12 2010, Ben Blum wrote:
>> Convert blk-cgroup to be buildable as a module
>>
>> From: Ben Blum <bblum@andrew.cmu.edu>
>>
>> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
>> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
>> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
>> enhanced to support the new module dependency.
>
> I'd like to add this for 2.6.34, but I'd like some sign-off(s) on the
> cgroup side of things (for patch 1/2).
>
Yes, it's for .34. But these 2 patches can't be applied to block-tree,
because the patches that implement cgroup modular feature are in -mm,
and they will go into .34.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module
2010-01-12 0:21 ` KAMEZAWA Hiroyuki
@ 2010-01-14 9:32 ` Balbir Singh
2010-01-14 11:42 ` Vivek Goyal
0 siblings, 1 reply; 28+ messages in thread
From: Balbir Singh @ 2010-01-14 9:32 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Vivek Goyal, Li Zefan, axboe, ryov, Andrew Morton, menage,
containers, linux-kernel
On Tue, Jan 12, 2010 at 5:51 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 8 Jan 2010 10:10:38 -0500
> Vivek Goyal <vgoyal@redhat.com> wrote:
>
>> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
>> > Convert blk-cgroup to be buildable as a module
>> >
>> > From: Ben Blum <bblum@andrew.cmu.edu>
>> >
>> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
>> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
>> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
>> > enhanced to support the new module dependency.
>> >
>>
>> Hi Ben,
>>
>> I will give this patch a try.
>>
>> So from blk-cgroup perspective, the advantage of allowing it as module
>> will be that we can save some memory if we are not using the controller?
>>
> Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
> introduced ?
>
My guess is it won't be, unless we start exposing page_cgroup API and
then make the module depend on memcg.
Balbir
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module
2010-01-14 9:32 ` Balbir Singh
@ 2010-01-14 11:42 ` Vivek Goyal
0 siblings, 0 replies; 28+ messages in thread
From: Vivek Goyal @ 2010-01-14 11:42 UTC (permalink / raw)
To: Balbir Singh
Cc: KAMEZAWA Hiroyuki, Li Zefan, axboe, ryov, Andrew Morton, menage,
containers, linux-kernel
On Thu, Jan 14, 2010 at 03:02:09PM +0530, Balbir Singh wrote:
> On Tue, Jan 12, 2010 at 5:51 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Fri, 8 Jan 2010 10:10:38 -0500
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> >> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> >> > Convert blk-cgroup to be buildable as a module
> >> >
> >> > From: Ben Blum <bblum@andrew.cmu.edu>
> >> >
> >> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> >> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> >> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> >> > enhanced to support the new module dependency.
> >> >
> >>
> >> Hi Ben,
> >>
> >> I will give this patch a try.
> >>
> >> So from blk-cgroup perspective, the advantage of allowing it as module
> >> will be that we can save some memory if we are not using the controller?
> >>
> > Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
> > introduced ?
> >
>
> My guess is it won't be, unless we start exposing page_cgroup API and
> then make the module depend on memcg.
I think I agree. When we introduce page_cgroup based page tracking, either
we need to export page_cgroup API or we can force blkio controller to
compile as in-kernel if user selects the CONFIG_PAGE_TRACKING option.
So as of now, I can't think why we should not we allow compiling blkio as
module as long as core cgroup functionality supports it safely.
Thanks
Vivek
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2010-01-14 11:43 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-31 5:10 [PATCH v4 0/4] cgroups: support for module-loadable subsystems Ben Blum
2009-12-31 5:12 ` [PATCH v4 1/4] cgroups: revamp subsys array Ben Blum
2009-12-31 5:13 ` [PATCH v4 2/4] cgroups: subsystem module loading interface Ben Blum
2009-12-31 5:14 ` [PATCH v4 3/4] cgroups: subsystem module unloading Ben Blum
2009-12-31 5:15 ` [PATCH v4 4/4] cgroups: net_cls as module Ben Blum
2010-01-07 0:04 ` [PATCH v4 0/4] cgroups: support for module-loadable subsystems Andrew Morton
2010-01-07 1:26 ` Ben Blum
2010-01-07 3:07 ` KAMEZAWA Hiroyuki
2010-01-07 6:42 ` Li Zefan
2010-01-07 7:16 ` KAMEZAWA Hiroyuki
2010-01-07 7:48 ` Ben Blum
2010-01-07 7:51 ` KAMEZAWA Hiroyuki
2010-01-07 8:04 ` Ben Blum
2010-01-07 8:14 ` Ben Blum
2010-01-07 8:22 ` Ben Blum
2010-01-08 5:27 ` Ben Blum
2010-01-08 5:29 ` [RFC] [PATCH 1/2] cgroups: modular subsystems support for use_id Ben Blum
2010-01-08 5:30 ` [RFC] [PATCH 2/2] cgroups: blkio subsystem as module Ben Blum
2010-01-08 15:10 ` Vivek Goyal
2010-01-12 0:21 ` KAMEZAWA Hiroyuki
2010-01-14 9:32 ` Balbir Singh
2010-01-14 11:42 ` Vivek Goyal
2010-01-08 16:33 ` Vivek Goyal
2010-01-12 23:34 ` Ben Blum
2010-01-12 23:36 ` [PATCH v2 1/2] cgroups: modular subsystems support for use_id Ben Blum
2010-01-12 23:37 ` [PATCH v2 2/2] cgroups: blkio subsystem as module Ben Blum
2010-01-14 9:15 ` Jens Axboe
2010-01-14 9:29 ` Li Zefan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox