* [PATCH v3 0/3] cgroup: Introducing bypass mode
@ 2017-08-09 17:55 Waiman Long
2017-08-09 17:55 ` [PATCH v3 1/3] cgroup: subtree_control bypass mode for non-domain controllers Waiman Long
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Waiman Long @ 2017-08-09 17:55 UTC (permalink / raw)
To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
torvalds, Roman Gushchin, Waiman Long
v2->v3:
- Remove invalid cgroup subdirectory creation patch.
- Add use cases for the bypass mode and removing statements about
control files ownership in cgroup-v2.txt.
- Restrict bypass mode to non-domain (threaded) controllers only.
v1->v2:
- Remove relax no-internal-process constraint patch as this feature
is in the thread mode v4 patch.
- Remove subtree root mode patch.
- Remove the skip dying css patch as I can no longer reproduce the
problem.
- Rework the bypass mode so that write to "cgroup.controllers"
to enable or disable controller interface files is only allowed
if the parent grants bypass mode to children by writing the
'#'-prefixed controller to "cgroup.subtree_control".
- Add a patch to disable subdirectory creation on an invalid domain.
v1 patch - https://lkml.org/lkml/2017/6/14/551
v2 patch - https://lkml.org/lkml/2017/7/21/606
This patchset introduces new capability to the cgroup v2 core to give
more freedom and flexibility to non-domain controllers so that they
can shape their own unique views of the virtual cgroup hierarchies
that can best suit thier own use cases. It also enables a cgroup
parent to selectively enable a non-domain controller in a subset of
its child cgroups instead of in either all or none of them.
The bypass mode cannot be used on domain controllers as it will
complicate resource distribution model and rules.
One use case is an application that want to use cpuset, for example,
to bind some worker threads to individual cpus. At the same time, the
application may also want to use cpu controller to limit the amount
of cpu consumed by some other threads. Right now, the only way to do
that with the current v2 control scheme is to create child cgroups
with both cpu and cpuset controllers enabled and put the desired
processes or threads into those child cgroups.
The cost of enabling cpuset on a task that need cpu controller is
negligible. However, the cost of enabling cpu controller on tasks
that only need cpuset can be noticeable. The performance difference
may become a concern for users who are thinking of moving from cgroup
v1 to v2.
Similarly, instead of cpuset, if we want to use perf_event, freezer or
other non-domain controllers in a subset of tasks, we will also need
to enable CPU controller along with the associated performance cost.
With bypass mode, we will have the ability to enable just the
non-domain controllers the tasks needed in their respective child
cgroups. It is just like what we can currently do with cgroup v1.
This patchset is layered on top of the "for-4.14" branch of Tejun's
cgroup git tree.
Patch 1 introduces a new bypass mode that allows a non-domain
controller to be disabled in a cgroup, but re-enabled again in its
children. This is enabled by writing the controller name prefixed with
'#' to the "cgroup.subtree_control" file. Then all its children will
have this controller in bypass mode.
Patch 2 extends the bypass mode mechanism to allow those child
cgroups that are put into the bypass mode for a particular non-domain
controller by their parent to be re-enabled again by writing the
controller name with the '+' prefix to the "cgroup.controllers" file.
Patch 3 extends the debug controller to expose additional controller
masks introduced by this patchset.
Waiman Long (3):
cgroup: subtree_control bypass mode for non-domain controllers
cgroup: Allow reenabling of controller in bypass mode
cgroup: Make debug controller report new controller masks
Documentation/cgroup-v2.txt | 58 +++++++---
include/linux/cgroup-defs.h | 19 +++-
kernel/cgroup/cgroup.c | 250 +++++++++++++++++++++++++++++++++++---------
kernel/cgroup/debug.c | 2 +
4 files changed, 257 insertions(+), 72 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/3] cgroup: subtree_control bypass mode for non-domain controllers
2017-08-09 17:55 [PATCH v3 0/3] cgroup: Introducing bypass mode Waiman Long
@ 2017-08-09 17:55 ` Waiman Long
2017-08-09 17:55 ` [PATCH v3 2/3] cgroup: Allow reenabling of controller in bypass mode Waiman Long
2017-08-09 17:55 ` [PATCH v3 3/3] cgroup: Make debug controller report new controller masks Waiman Long
2 siblings, 0 replies; 4+ messages in thread
From: Waiman Long @ 2017-08-09 17:55 UTC (permalink / raw)
To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
torvalds, Roman Gushchin, Waiman Long
The special prefix '#' attached to a non-domain controller name can now
be written into the cgroup.subtree_control file to set that controller
in bypass mode in all the child cgroups. The controller will show up
in the children's cgroup.controllers file, but the corresponding
control knobs will be absent. However, that controller can be
enabled or bypassed in its children by writing to their respective
subtree_control files.
This mode is useful to non-domain controllers where there are costs to
each additional layer of hierarchy. This mode will also allow more
freedom in how each controller can shape its effective hierarchy
independent of each others.
Signed-off-by: Waiman Long <longman@redhat.com>
---
include/linux/cgroup-defs.h | 12 ++--
kernel/cgroup/cgroup.c | 143 ++++++++++++++++++++++++++++----------------
2 files changed, 100 insertions(+), 55 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 59e4ad9..15655e5 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -308,16 +308,18 @@ struct cgroup {
struct cgroup_file events_file; /* handle for "cgroup.events" */
/*
- * The bitmask of subsystems enabled on the child cgroups.
- * ->subtree_control is the one configured through
- * "cgroup.subtree_control" while ->child_ss_mask is the effective
- * one which may have more subsystems enabled. Controller knobs
- * are made available iff it's enabled in ->subtree_control.
+ * The bitmask of subsystems enabled or bypassed on the child cgroups.
+ * ->subtree_control and ->subtree_bypass are the one configured
+ * through "cgroup.subtree_control" while ->subtree_ss_mask is the
+ * effective one which may have more subsystems enabled. Controller
+ * knobs are made available iff it's enabled in ->subtree_ss_mask.
*/
u16 subtree_control;
u16 subtree_ss_mask;
+ u16 subtree_bypass;
u16 old_subtree_control;
u16 old_subtree_ss_mask;
+ u16 old_subtree_bypass;
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f5ca55d..9e69f7f 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -365,7 +365,8 @@ static bool cgroup_can_be_thread_root(struct cgroup *cgrp)
return false;
/* and no domain controllers can be enabled */
- if (cgrp->subtree_control & ~cgrp_dfl_threaded_ss_mask)
+ if ((cgrp->subtree_control|cgrp->subtree_bypass) &
+ ~cgrp_dfl_threaded_ss_mask)
return false;
return true;
@@ -387,7 +388,8 @@ bool cgroup_is_thread_root(struct cgroup *cgrp)
* enabled is a thread root.
*/
if (cgroup_has_tasks(cgrp) &&
- (cgrp->subtree_control & cgrp_dfl_threaded_ss_mask))
+ ((cgrp->subtree_control|cgrp->subtree_bypass)
+ & cgrp_dfl_threaded_ss_mask))
return true;
return false;
@@ -412,7 +414,7 @@ static bool cgroup_is_valid_domain(struct cgroup *cgrp)
}
/* subsystems visibly enabled on a cgroup */
-static u16 cgroup_control(struct cgroup *cgrp)
+static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
{
struct cgroup *parent = cgroup_parent(cgrp);
u16 root_ss_mask = cgrp->root->subsys_mask;
@@ -420,6 +422,9 @@ static u16 cgroup_control(struct cgroup *cgrp)
if (parent) {
u16 ss_mask = parent->subtree_control;
+ if (show_bypass)
+ ss_mask |= parent->subtree_bypass;
+
/* threaded cgroups can only have threaded controllers */
if (cgroup_is_threaded(cgrp))
ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -433,13 +438,17 @@ static u16 cgroup_control(struct cgroup *cgrp)
}
/* subsystems enabled on a cgroup */
-static u16 cgroup_ss_mask(struct cgroup *cgrp)
+static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
{
struct cgroup *parent = cgroup_parent(cgrp);
if (parent) {
u16 ss_mask = parent->subtree_ss_mask;
+
+ if (show_bypass)
+ ss_mask |= parent->subtree_bypass;
+
/* threaded cgroups can only have threaded controllers */
if (cgroup_is_threaded(cgrp))
ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -492,7 +501,7 @@ static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
* This function is used while updating css associations and thus
* can't test the csses directly. Test ss_mask.
*/
- while (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) {
+ while (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id))) {
cgrp = cgroup_parent(cgrp);
if (!cgrp)
return NULL;
@@ -2359,7 +2368,7 @@ int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp)
return 0;
/* apply no-internal-process constraint */
- if (dst_cgrp->subtree_control)
+ if (dst_cgrp->subtree_control|dst_cgrp->subtree_bypass)
return -EBUSY;
return 0;
@@ -2657,15 +2666,18 @@ void cgroup_procs_write_finish(struct task_struct *task)
ss->post_attach();
}
-static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
+static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask,
+ u16 bypass_mask)
{
struct cgroup_subsys *ss;
bool printed = false;
int ssid;
- do_each_subsys_mask(ss, ssid, ss_mask) {
+ do_each_subsys_mask(ss, ssid, ss_mask|bypass_mask) {
if (printed)
seq_putc(seq, ' ');
+ if (!(ss_mask & (1 << ssid)))
+ seq_putc(seq, '#');
seq_printf(seq, "%s", ss->name);
printed = true;
} while_each_subsys_mask();
@@ -2677,8 +2689,10 @@ static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
static int cgroup_controllers_show(struct seq_file *seq, void *v)
{
struct cgroup *cgrp = seq_css(seq)->cgroup;
+ struct cgroup *parent = cgroup_parent(cgrp);
+ u16 bypass = parent ? parent->subtree_bypass : 0;
- cgroup_print_ss_mask(seq, cgroup_control(cgrp));
+ cgroup_print_ss_mask(seq, cgroup_control(cgrp, false), bypass);
return 0;
}
@@ -2687,7 +2701,7 @@ static int cgroup_subtree_control_show(struct seq_file *seq, void *v)
{
struct cgroup *cgrp = seq_css(seq)->cgroup;
- cgroup_print_ss_mask(seq, cgrp->subtree_control);
+ cgroup_print_ss_mask(seq, cgrp->subtree_control, cgrp->subtree_bypass);
return 0;
}
@@ -2800,6 +2814,7 @@ static void cgroup_save_control(struct cgroup *cgrp)
cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
dsct->old_subtree_control = dsct->subtree_control;
dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
+ dsct->old_subtree_bypass = dsct->subtree_bypass;
}
}
@@ -2817,10 +2832,13 @@ static void cgroup_propagate_control(struct cgroup *cgrp)
struct cgroup_subsys_state *d_css;
cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
- dsct->subtree_control &= cgroup_control(dsct);
+ u16 mask = cgroup_control(dsct, true);
+
+ dsct->subtree_control &= mask;
+ dsct->subtree_bypass &= mask;
dsct->subtree_ss_mask =
cgroup_calc_subtree_ss_mask(dsct->subtree_control,
- cgroup_ss_mask(dsct));
+ cgroup_ss_mask(dsct, true));
}
}
@@ -2839,6 +2857,7 @@ static void cgroup_restore_control(struct cgroup *cgrp)
cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) {
dsct->subtree_control = dsct->old_subtree_control;
dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
+ dsct->subtree_bypass = dsct->old_subtree_bypass;
}
}
@@ -2847,9 +2866,9 @@ static bool css_visible(struct cgroup_subsys_state *css)
struct cgroup_subsys *ss = css->ss;
struct cgroup *cgrp = css->cgroup;
- if (cgroup_control(cgrp) & (1 << ss->id))
+ if (cgroup_control(cgrp, false) & (1 << ss->id))
return true;
- if (!(cgroup_ss_mask(cgrp) & (1 << ss->id)))
+ if (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id)))
return false;
return cgroup_on_dfl(cgrp) && ss->implicit_on_dfl;
}
@@ -2880,7 +2899,7 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp)
WARN_ON_ONCE(css && percpu_ref_is_dying(&css->refcnt));
- if (!(cgroup_ss_mask(dsct) & (1 << ss->id)))
+ if (!(cgroup_ss_mask(dsct, false) & (1 << ss->id)))
continue;
if (!css) {
@@ -2930,7 +2949,7 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
continue;
if (css->parent &&
- !(cgroup_ss_mask(dsct) & (1 << ss->id))) {
+ !(cgroup_ss_mask(dsct, false) & (1 << ss->id))) {
kill_css(css);
} else if (!css_visible(css)) {
css_clear_dir(css);
@@ -3042,7 +3061,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
char *buf, size_t nbytes,
loff_t off)
{
- u16 enable = 0, disable = 0;
+ u16 enable = 0, disable = 0, bypass = 0;
+ u16 child_enable = 0;
struct cgroup *cgrp, *child;
struct cgroup_subsys *ss;
char *tok;
@@ -3063,10 +3083,16 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
if (*tok == '+') {
enable |= 1 << ssid;
+ bypass &= ~(1 << ssid);
disable &= ~(1 << ssid);
} else if (*tok == '-') {
disable |= 1 << ssid;
enable &= ~(1 << ssid);
+ bypass &= ~(1 << ssid);
+ } else if (*tok == '#') {
+ bypass |= 1 << ssid;
+ enable &= ~(1 << ssid);
+ disable &= ~(1 << ssid);
} else {
return -EINVAL;
}
@@ -3080,35 +3106,42 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
if (!cgrp)
return -ENODEV;
- for_each_subsys(ss, ssid) {
- if (enable & (1 << ssid)) {
- if (cgrp->subtree_control & (1 << ssid)) {
- enable &= ~(1 << ssid);
- continue;
- }
+ /*
+ * Cannot use controllers that aren't allowed.
+ */
+ if (~cgroup_control(cgrp, true) & (enable|disable|bypass)) {
+ ret = -ENOENT;
+ goto out_unlock;
+ }
- if (!(cgroup_control(cgrp) & (1 << ssid))) {
- ret = -ENOENT;
- goto out_unlock;
- }
- } else if (disable & (1 << ssid)) {
- if (!(cgrp->subtree_control & (1 << ssid))) {
- disable &= ~(1 << ssid);
- continue;
- }
+ /*
+ * Strip out redundant bits.
+ */
+ enable &= ~cgrp->subtree_control;
+ bypass &= ~cgrp->subtree_bypass;
+ disable &= (cgrp->subtree_control|cgrp->subtree_bypass);
- /* a child has it enabled? */
- cgroup_for_each_live_child(child, cgrp) {
- if (child->subtree_control & (1 << ssid)) {
- ret = -EBUSY;
- goto out_unlock;
- }
- }
- }
+ if (!(enable|bypass|disable)) {
+ ret = 0;
+ goto out_unlock;
}
- if (!enable && !disable) {
- ret = 0;
+ /*
+ * Only threaded controllers can be bypassed.
+ */
+ if (bypass & ~cgrp_dfl_threaded_ss_mask) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ cgroup_for_each_live_child(child, cgrp)
+ child_enable |= child->subtree_control|child->subtree_bypass;
+
+ /*
+ * Cannot change the state of a controller if enabled in children.
+ */
+ if ((enable|bypass|disable) & child_enable) {
+ ret = -EBUSY;
goto out_unlock;
}
@@ -3120,7 +3153,9 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
cgroup_save_control(cgrp);
cgrp->subtree_control |= enable;
- cgrp->subtree_control &= ~disable;
+ cgrp->subtree_control &= ~(bypass|disable);
+ cgrp->subtree_bypass |= bypass;
+ cgrp->subtree_bypass &= ~(enable|disable);
ret = cgroup_apply_control(cgrp);
@@ -4565,7 +4600,8 @@ static void css_release(struct percpu_ref *ref)
}
static void init_and_link_css(struct cgroup_subsys_state *css,
- struct cgroup_subsys *ss, struct cgroup *cgrp)
+ struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_subsys_state *parent_css)
{
lockdep_assert_held(&cgroup_mutex);
@@ -4580,8 +4616,8 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
css->serial_nr = css_serial_nr_next++;
atomic_set(&css->online_cnt, 0);
- if (cgroup_parent(cgrp)) {
- css->parent = cgroup_css(cgroup_parent(cgrp), ss);
+ if (parent_css) {
+ css->parent = parent_css;
css_get(css->parent);
}
@@ -4644,19 +4680,26 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
struct cgroup_subsys *ss)
{
struct cgroup *parent = cgroup_parent(cgrp);
- struct cgroup_subsys_state *parent_css = cgroup_css(parent, ss);
+ struct cgroup_subsys_state *parent_css = NULL;
struct cgroup_subsys_state *css;
int err;
lockdep_assert_held(&cgroup_mutex);
+ /*
+ * As cgroup may be in bypass mode, need to skip over ancestor
+ * cgroups with NULL CSS.
+ */
+ for (; parent && !parent_css; parent = cgroup_parent(parent))
+ parent_css = cgroup_css(parent, ss);
+
css = ss->css_alloc(parent_css);
if (!css)
css = ERR_PTR(-ENOMEM);
if (IS_ERR(css))
return css;
- init_and_link_css(css, ss, cgrp);
+ init_and_link_css(css, ss, cgrp, parent_css);
err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
if (err)
@@ -4762,7 +4805,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
* subtree_control from the parent. Each is configured manually.
*/
if (!cgroup_on_dfl(cgrp))
- cgrp->subtree_control = cgroup_control(cgrp);
+ cgrp->subtree_control = cgroup_control(cgrp, false);
if (parent)
cgroup_bpf_inherit(cgrp, parent);
@@ -5074,7 +5117,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
css = ss->css_alloc(cgroup_css(&cgrp_dfl_root.cgrp, ss));
/* We don't handle early failures gracefully */
BUG_ON(IS_ERR(css));
- init_and_link_css(css, ss, &cgrp_dfl_root.cgrp);
+ init_and_link_css(css, ss, &cgrp_dfl_root.cgrp, NULL);
/*
* Root csses are never destroyed and we can't initialize
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 2/3] cgroup: Allow reenabling of controller in bypass mode
2017-08-09 17:55 [PATCH v3 0/3] cgroup: Introducing bypass mode Waiman Long
2017-08-09 17:55 ` [PATCH v3 1/3] cgroup: subtree_control bypass mode for non-domain controllers Waiman Long
@ 2017-08-09 17:55 ` Waiman Long
2017-08-09 17:55 ` [PATCH v3 3/3] cgroup: Make debug controller report new controller masks Waiman Long
2 siblings, 0 replies; 4+ messages in thread
From: Waiman Long @ 2017-08-09 17:55 UTC (permalink / raw)
To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
torvalds, Roman Gushchin, Waiman Long
Non-domain controllers set to bypass mode in the parent's
"cgroup.subtree_control" can now be optionally enabled by writing the
controller name with the '+' prefix to "cgroup.controllers". Using the
'#' prefix will reset it back to the bypass state.
This capability allows a cgroup parent to individually enable
non-domain controllers in a subset of its children instead of either
all or none of them. This increases the flexibility each controller
has in shaping the effective cgroup hierarchy to best suit its need.
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/cgroup-v2.txt | 58 +++++++++++++++++------
include/linux/cgroup-defs.h | 7 +++
kernel/cgroup/cgroup.c | 109 ++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 156 insertions(+), 18 deletions(-)
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dc44785..e76dc4cf 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -363,10 +363,16 @@ disabled by writing to the "cgroup.subtree_control" file::
# echo "+cpu +memory -io" > cgroup.subtree_control
+The prefixes '+', '-' and '#' are used to enable, disable or put
+a controller in the bypass mode respectively. In the bypass mode,
+a controller is disabled in a cgroup, but it can be enabled again in
+its child cgroups as it will still be listed in "cgroup.controllers".
+Bypass mode can only be used on non-domain controllers.
+
Only controllers which are listed in "cgroup.controllers" can be
-enabled. When multiple operations are specified as above, either they
-all succeed or fail. If multiple operations on the same controller
-are specified, the last one is effective.
+enabled or bypassed. When multiple operations are specified as above,
+either they all succeed or fail. If multiple operations on the same
+controller are specified, the last one is effective.
Enabling a controller in a cgroup indicates that the distribution of
the target resource across its immediate children will be controlled.
@@ -390,6 +396,20 @@ prefixed controller interface files from C and D. This means that the
controller interface files - anything which doesn't start with
"cgroup." are owned by the parent rather than the cgroup itself.
+Once a non-domain controller is put into bypass mode in
+"cgroup.subtree_control", that controller can optionally be enabled
+again in child cgroups by writing the controller name with the '+
+prefix into "cgroup.controllers". Writing the controller name with
+the '#' prefix into "cgroup.controllers" resets the state back to
+bypass mode. The state of a non-domain controller cannot be changed
+anymore if it is enabled or bypassed in its "cgroup.subtree_control".
+
+The use of bypass mode thus allows a cgroup parent to have the ability
+to selectively enable a non-domain controller in a subset of its
+child cgroups instead of in either all or none of them. In other words,
+a non-domain controller can be enabled only on the cgroup that actually
+needs it, if desired.
+
Top-down Constraint
~~~~~~~~~~~~~~~~~~~
@@ -397,10 +417,11 @@ Top-down Constraint
Resources are distributed top-down and a cgroup can further distribute
a resource only if the resource has been distributed to it from the
parent. This means that all non-root "cgroup.subtree_control" files
-can only contain controllers which are enabled in the parent's
-"cgroup.subtree_control" file. A controller can be enabled only if
-the parent has the controller enabled and a controller can't be
-disabled if one or more children have it enabled.
+can only contain controllers which are enabled or bypassed in the parent's
+"cgroup.subtree_control" file. A controller can be enabled or bypassed
+only if the parent has the controller enabled or bypassed and the
+state of a controller can't be changed if one or more children have
+it enabled or bypassed.
No Internal Process Constraint
@@ -823,11 +844,18 @@ All cgroup core files are prefixed with "cgroup."
should be granted along with the containing directory.
cgroup.controllers
- A read-only space separated values file which exists on all
+ A read-write space separated values file which exists on all
cgroups.
It shows space separated list of all controllers available to
- the cgroup. The controllers are not ordered.
+ the cgroup. Controller names with '#' prefix are in bypass
+ mode. The controllers are not ordered.
+
+ When a controller is set into bypass mode in its parent's
+ "cgroup.subtree_control", its name prefixed with '+' or '#'
+ can be written to enable it or reset it back to bypass mode
+ respectively. Controllers not in bypass mode are not allowed
+ to be written.
cgroup.subtree_control
A read-write space separated values file which exists on all
@@ -837,12 +865,12 @@ All cgroup core files are prefixed with "cgroup."
which are enabled to control resource distribution from the
cgroup to its children.
- Space separated list of controllers prefixed with '+' or '-'
- can be written to enable or disable controllers. A controller
- name prefixed with '+' enables the controller and '-'
- disables. If a controller appears more than once on the list,
- the last one is effective. When multiple enable and disable
- operations are specified, either all succeed or all fail.
+ Space separated list of controllers prefixed with '+', '-' or
+ '#' can be written to enable, disable or bypass controllers
+ respectively. If a controller appears more than once on
+ the list, the last one is effective. When multiple enable,
+ disable or bypass operations are specified, either all succeed
+ or all fail.
cgroup.events
A read-only flat-keyed file which exists on non-root cgroups.
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 15655e5..9f03254 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -321,6 +321,13 @@ struct cgroup {
u16 old_subtree_ss_mask;
u16 old_subtree_bypass;
+ /*
+ * The bitmask of subsystems that are set in its parent's
+ * ->subtree_bypass and explicitly enabled in this cgroup.
+ */
+ u16 enable_ss_mask;
+ u16 old_enable_ss_mask;
+
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 9e69f7f..17591fb 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -420,7 +420,7 @@ static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
u16 root_ss_mask = cgrp->root->subsys_mask;
if (parent) {
- u16 ss_mask = parent->subtree_control;
+ u16 ss_mask = parent->subtree_control|cgrp->enable_ss_mask;
if (show_bypass)
ss_mask |= parent->subtree_bypass;
@@ -443,7 +443,7 @@ static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
struct cgroup *parent = cgroup_parent(cgrp);
if (parent) {
- u16 ss_mask = parent->subtree_ss_mask;
+ u16 ss_mask = parent->subtree_ss_mask|cgrp->enable_ss_mask;
if (show_bypass)
@@ -2815,6 +2815,7 @@ static void cgroup_save_control(struct cgroup *cgrp)
dsct->old_subtree_control = dsct->subtree_control;
dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
dsct->old_subtree_bypass = dsct->subtree_bypass;
+ dsct->old_enable_ss_mask = dsct->enable_ss_mask;
}
}
@@ -2858,6 +2859,7 @@ static void cgroup_restore_control(struct cgroup *cgrp)
dsct->subtree_control = dsct->old_subtree_control;
dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
dsct->subtree_bypass = dsct->old_subtree_bypass;
+ dsct->enable_ss_mask = dsct->old_enable_ss_mask;
}
}
@@ -3135,7 +3137,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
}
cgroup_for_each_live_child(child, cgrp)
- child_enable |= child->subtree_control|child->subtree_bypass;
+ child_enable |= child->subtree_control|child->subtree_bypass|
+ child->enable_ss_mask;
/*
* Cannot change the state of a controller if enabled in children.
@@ -3168,6 +3171,105 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+/*
+ * Change bypass status of controllers for a cgroup in the default hierarchy.
+ */
+static ssize_t cgroup_controllers_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes,
+ loff_t off)
+{
+ u16 enable = 0, bypass = 0;
+ struct cgroup *cgrp, *parent;
+ struct cgroup_subsys *ss;
+ char *tok;
+ int ssid, ret;
+
+ /*
+ * Parse input - space separated list of subsystem names prefixed
+ * with either + or #.
+ */
+ buf = strstrip(buf);
+ while ((tok = strsep(&buf, " "))) {
+ if (tok[0] == '\0')
+ continue;
+ do_each_subsys_mask(ss, ssid, ~cgrp_dfl_inhibit_ss_mask) {
+ if (!cgroup_ssid_enabled(ssid) ||
+ strcmp(tok + 1, ss->name))
+ continue;
+
+ if (*tok == '+') {
+ enable |= 1 << ssid;
+ bypass &= ~(1 << ssid);
+ } else if (*tok == '#') {
+ bypass |= 1 << ssid;
+ enable &= ~(1 << ssid);
+ } else {
+ return -EINVAL;
+ }
+ break;
+ } while_each_subsys_mask();
+ if (ssid == CGROUP_SUBSYS_COUNT)
+ return -EINVAL;
+ }
+
+ cgrp = cgroup_kn_lock_live(of->kn, true);
+ if (!cgrp)
+ return -ENODEV;
+
+ /*
+ * Write to root cgroup's controllers file is not allowed.
+ */
+ parent = cgroup_parent(cgrp);
+ if (!parent) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /*
+ * Only controllers set into bypass mode in the parent cgroup
+ * can be specified here.
+ */
+ if (~parent->subtree_bypass & (enable|bypass)) {
+ ret = -ENOENT;
+ goto out_unlock;
+ }
+
+ /*
+ * Mask off irrelevant bits.
+ */
+ enable &= ~cgrp->enable_ss_mask;
+ bypass &= cgrp->enable_ss_mask;
+
+ if (!(enable|bypass)) {
+ ret = 0;
+ goto out_unlock;
+ }
+
+ /*
+ * We cannot change the bypass state of a controller that is enabled
+ * in subtree_control.
+ */
+ if ((cgrp->subtree_control|cgrp->subtree_bypass) & (enable|bypass)) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ /* Save and update control masks and prepare csses */
+ cgroup_save_control(cgrp);
+
+ cgrp->enable_ss_mask |= enable;
+ cgrp->enable_ss_mask &= ~bypass;
+
+ ret = cgroup_apply_control(cgrp);
+ cgroup_finalize_control(cgrp, ret);
+ kernfs_activate(cgrp->kn);
+ ret = 0;
+
+out_unlock:
+ cgroup_kn_unlock(of->kn);
+ return ret ?: nbytes;
+}
+
/**
* cgroup_enable_threaded - make @cgrp threaded
* @cgrp: the target cgroup
@@ -4433,6 +4535,7 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
{
.name = "cgroup.controllers",
.seq_show = cgroup_controllers_show,
+ .write = cgroup_controllers_write,
},
{
.name = "cgroup.subtree_control",
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 3/3] cgroup: Make debug controller report new controller masks
2017-08-09 17:55 [PATCH v3 0/3] cgroup: Introducing bypass mode Waiman Long
2017-08-09 17:55 ` [PATCH v3 1/3] cgroup: subtree_control bypass mode for non-domain controllers Waiman Long
2017-08-09 17:55 ` [PATCH v3 2/3] cgroup: Allow reenabling of controller in bypass mode Waiman Long
@ 2017-08-09 17:55 ` Waiman Long
2 siblings, 0 replies; 4+ messages in thread
From: Waiman Long @ 2017-08-09 17:55 UTC (permalink / raw)
To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
torvalds, Roman Gushchin, Waiman Long
The newly added cgroup controller masks (subtree_bypass and
enable_ss_mask) are now being reported in the debug.masks controller
file.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/debug.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c
index f661b4c..5f35a76 100644
--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -262,6 +262,8 @@ static int cgroup_masks_read(struct seq_file *seq, void *v)
cgroup_masks_read_one(seq, "subtree_control", cgrp->subtree_control);
cgroup_masks_read_one(seq, "subtree_ss_mask", cgrp->subtree_ss_mask);
+ cgroup_masks_read_one(seq, "subtree_bypass", cgrp->subtree_bypass);
+ cgroup_masks_read_one(seq, "enable_ss_mask", cgrp->enable_ss_mask);
cgroup_kn_unlock(of->kn);
return 0;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-08-09 17:55 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-09 17:55 [PATCH v3 0/3] cgroup: Introducing bypass mode Waiman Long
2017-08-09 17:55 ` [PATCH v3 1/3] cgroup: subtree_control bypass mode for non-domain controllers Waiman Long
2017-08-09 17:55 ` [PATCH v3 2/3] cgroup: Allow reenabling of controller in bypass mode Waiman Long
2017-08-09 17:55 ` [PATCH v3 3/3] cgroup: Make debug controller report new controller masks Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).