* [PATCH v2 0/4] cgroup: Introducing bypass mode
@ 2017-07-21 20:34 Waiman Long
[not found] ` <1500669293-21792-1-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw)
To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ,
efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
guro-b10kYP2dOMg, Waiman Long
v1->v2:
- Remove relax no-internal-process constraint patch as this feature
is in the thread mode v4 patch.
- Remove subtree root mode patch.
- Remove the skip dying css patch as I can no longer reproduce the
problem.
- Rework the bypass mode so that write to "cgroup.controllers"
to enable or disable controller interface files is only allowed
if the parent grants bypass mode to children by writing the
'#'-prefixed controller to "cgroup.subtree_control".
- Add a patch to disable subdirectory creation on an invalid domain.
v1 patch - https://lkml.org/lkml/2017/6/14/551
This patchset introduces new capability to the cgroup v2 core to give
more freedom and flexibility to controllers so that they can shape
their own unique views of the virtual cgroup hierarchies that can
best suit thier own use cases.
This patchset is layered on top of the "review-cgroup2-cpu-on-v4"
branch of Tejun's cgroup git tree.
Patch 1 disables subdirectory creation when a cgroup is an invalid
domain.
Patch 2 introduces a new bypass mode that allows a controller to
be disabled in a cgroup, but re-enabled again in its children. This
is enabled by writing the controller name prefixed with '#' to the
"cgroup.subtree_control" file. Then all its children will have this
controller in bypass mode.
Patch 3 extends the bypass mode mechanism to allow those child cgroups
that are put into the bypass mode by their parent to re-enable the
controller by writing the controller name with the '+' prefix to the
"cgroup.controllers" file if they choose to. So setting bypass mode
in "cgroup.subtree_control" effectively delegates the authority to
enable or disable a controller to its children.
Patch 4 extends the debug controller to expose additional controller
masks introduced by this patchset.
Waiman Long (4):
cgroup: Child cgroup creation not allowed on invalid domain
cgroup: Allow bypass mode in subtree_control
cgroup: Allow reenabling of controller in bypass mode
cgroup: Make debug controller report new controller masks
Documentation/cgroup-v2.txt | 90 +++++++++++++---
include/linux/cgroup-defs.h | 19 +++-
kernel/cgroup/cgroup.c | 251 +++++++++++++++++++++++++++++++++++---------
kernel/cgroup/debug.c | 2 +
4 files changed, 288 insertions(+), 74 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 16+ messages in thread[parent not found: <1500669293-21792-1-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain [not found] ` <1500669293-21792-1-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-07-21 20:34 ` Waiman Long 2017-07-22 13:43 ` Tejun Heo 0 siblings, 1 reply; 16+ messages in thread From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw) To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg, pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ, efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, guro-b10kYP2dOMg, Waiman Long When thread mode is used, it is possible that some cgroups may be in an invalid state. Currently users may not be aware that they are invalid until they try to migrate tasks over. This patch disallows child cgroup creation on invalid domain. This adds one more failure point in reminding users that they are dealing with invalid domains. It also minimizes the number of invalid domains outstanding as much as possible. Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- kernel/cgroup/cgroup.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index e9a377d..5fc8133 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -4664,6 +4664,14 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode) if (!parent) return -ENODEV; + /* + * New cgroup creation isn't allowed on an invalid domain parent. + */ + if (!cgroup_is_threaded(parent) && !cgroup_is_valid_domain(parent)) { + ret = -EOPNOTSUPP; + goto out_unlock; + } + cgrp = cgroup_create(parent); if (IS_ERR(cgrp)) { ret = PTR_ERR(cgrp); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain 2017-07-21 20:34 ` [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain Waiman Long @ 2017-07-22 13:43 ` Tejun Heo [not found] ` <20170722134358.GB3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> 2017-07-24 17:48 ` [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain Waiman Long 0 siblings, 2 replies; 16+ messages in thread From: Tejun Heo @ 2017-07-22 13:43 UTC (permalink / raw) To: Waiman Long Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro Hello, Waiman. On Fri, Jul 21, 2017 at 04:34:50PM -0400, Waiman Long wrote: > When thread mode is used, it is possible that some cgroups may be > in an invalid state. Currently users may not be aware that they are > invalid until they try to migrate tasks over. This patch disallows > child cgroup creation on invalid domain. This adds one more failure > point in reminding users that they are dealing with invalid domains. > It also minimizes the number of invalid domains outstanding as much > as possible. It's a bit inconsistent because we can reach the same forbidden state by turning a sibling cgroup threaded. Please consider the following. A / \ B C \ D Let's say all are domains and we make B threaded. A becomes the threaded domain, C and D become invalid, which is the configuration you're trying to prevent. We can either enabling threaded on B too or relax type modifications further so that people can make C threaded which makes sense given that that would lead to a topology which has to supported anyway (if C were threaded before D was created, it'd look the same). So, I'm leaning more towards relaxing restrictions and tightening it, and given that we have to expose invalid state anyway, I think there's actual benefit in doing so as it gives more flexibility while building the hierarchy. Thanks. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20170722134358.GB3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>]
* [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode [not found] ` <20170722134358.GB3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> @ 2017-07-23 12:18 ` Tejun Heo [not found] ` <20170723121826.GC1498614-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> 2017-07-25 17:16 ` [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode Tejun Heo 0 siblings, 2 replies; 16+ messages in thread From: Tejun Heo @ 2017-07-23 12:18 UTC (permalink / raw) To: Waiman Long Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg, pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ, efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, guro-b10kYP2dOMg cgroup_enable_threaded() checks that the cgroup doesn't have any tasks or children and fails the operation if so. This test is unnecessary because the first part is already checked by cgroup_can_be_thread_root() and the latter is unnecessary. The latter actually cause a behavioral oddity. Please consider the following hierarchy. All cgroups are domains. A / \ B C \ D If B is made threaded, C and D becomes invalid domains. Due to the no children restriction, threaded mode can't be enabled on C. For C and D, the only thing the user can do is removal. There is no reason for this restriction. Remove it. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Cc: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- Hello, So, the only thing inconsistent there was the extra restriction when enabling threaded mode when all the necessary conditions are already checked by when verifying the parent's domain. Thanks. Documentation/cgroup-v2.txt | 5 +++-- kernel/cgroup/cgroup.c | 7 ------- 2 files changed, 3 insertions(+), 9 deletions(-) --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -274,8 +274,9 @@ thread mode, the following conditions mu - As the cgroup will join the parent's resource domain. The parent must either be a valid (threaded) domain or a threaded cgroup. -- The cgroup must be empty. No enabled controllers, child cgroups or - processes. +- When the parent is an unthreaded domain, it must not have any domain + controllers enabled or populated domain children. The root is + exempt from this requirement. Topology-wise, a cgroup can be in an invalid state. Please consider the following toplogy:: --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3147,13 +3147,6 @@ static int cgroup_enable_threaded(struct return -EOPNOTSUPP; /* - * Allow enabling thread mode only on empty cgroups to avoid - * implicit migrations and recursive operations. - */ - if (cgroup_has_tasks(cgrp) || css_has_online_children(&cgrp->self)) - return -EBUSY; - - /* * The following shouldn't cause actual migrations and should * always succeed. */ ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20170723121826.GC1498614-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode [not found] ` <20170723121826.GC1498614-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> @ 2017-07-24 19:14 ` Waiman Long [not found] ` <c2637f47-3e90-a322-f84b-0c913da14977-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Waiman Long @ 2017-07-24 19:14 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg, pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ, efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, guro-b10kYP2dOMg On 07/23/2017 08:18 AM, Tejun Heo wrote: > cgroup_enable_threaded() checks that the cgroup doesn't have any tasks > or children and fails the operation if so. This test is unnecessary > because the first part is already checked by > cgroup_can_be_thread_root() and the latter is unnecessary. The latter > actually cause a behavioral oddity. Please consider the following > hierarchy. All cgroups are domains. > > A > / \ > B C > \ > D > > If B is made threaded, C and D becomes invalid domains. Due to the no > children restriction, threaded mode can't be enabled on C. For C and > D, the only thing the user can do is removal. > > There is no reason for this restriction. Remove it. > > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > Cc: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > --- > Hello, > > So, the only thing inconsistent there was the extra restriction when > enabling threaded mode when all the necessary conditions are already > checked by when verifying the parent's domain. > > Thanks. > > Documentation/cgroup-v2.txt | 5 +++-- > kernel/cgroup/cgroup.c | 7 ------- > 2 files changed, 3 insertions(+), 9 deletions(-) > > --- a/Documentation/cgroup-v2.txt > +++ b/Documentation/cgroup-v2.txt > @@ -274,8 +274,9 @@ thread mode, the following conditions mu > - As the cgroup will join the parent's resource domain. The parent > must either be a valid (threaded) domain or a threaded cgroup. > > -- The cgroup must be empty. No enabled controllers, child cgroups or > - processes. > +- When the parent is an unthreaded domain, it must not have any domain > + controllers enabled or populated domain children. The root is > + exempt from this requirement. > > Topology-wise, a cgroup can be in an invalid state. Please consider > the following toplogy:: > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -3147,13 +3147,6 @@ static int cgroup_enable_threaded(struct > return -EOPNOTSUPP; > > /* > - * Allow enabling thread mode only on empty cgroups to avoid > - * implicit migrations and recursive operations. > - */ > - if (cgroup_has_tasks(cgrp) || css_has_online_children(&cgrp->self)) > - return -EBUSY; > - > - /* > * The following shouldn't cause actual migrations and should > * always succeed. > */ Acked-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> While you are at it, the one comment that I have with cgroup_enable_threaded() is that it does not check to see if the cgroup parent exists. I thought for a while the first time I saw it, but then realized that cgroup.type wasn't show in root. I think a comment about this will make the code less puzzling to causal reader. Cheers, Longman ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <c2637f47-3e90-a322-f84b-0c913da14977-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* [PATCH cgroup/for-4.14] cgroup: add comment to cgroup_enable_threaded() [not found] ` <c2637f47-3e90-a322-f84b-0c913da14977-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-07-25 17:22 ` Tejun Heo 0 siblings, 0 replies; 16+ messages in thread From: Tejun Heo @ 2017-07-25 17:22 UTC (permalink / raw) To: Waiman Long Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg, pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ, efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, guro-b10kYP2dOMg From c705a00d77457b44ba3790fdf0627ecb8593a254 Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Date: Tue, 25 Jul 2017 13:20:18 -0400 Explain cgroup_enable_threaded() and note that the function can never be called on the root cgroup. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Suggested-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- Hello, Waiman. Good point. Added comment explaining what's going on. Applying the patch to cgroup/for-4.14. Thanks. kernel/cgroup/cgroup.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index e0a558c..85f6a11 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3129,6 +3129,15 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, return ret ?: nbytes; } +/** + * cgroup_enable_threaded - make @cgrp threaded + * @cgrp: the target cgroup + * + * Called when "threaded" is written to the cgroup.type interface file and + * tries to make @cgrp threaded and join the parent's resource domain. + * This function is never called on the root cgroup as cgroup.type doesn't + * exist on it. + */ static int cgroup_enable_threaded(struct cgroup *cgrp) { struct cgroup *parent = cgroup_parent(cgrp); -- 2.9.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode 2017-07-23 12:18 ` [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode Tejun Heo [not found] ` <20170723121826.GC1498614-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> @ 2017-07-25 17:16 ` Tejun Heo 1 sibling, 0 replies; 16+ messages in thread From: Tejun Heo @ 2017-07-25 17:16 UTC (permalink / raw) To: Waiman Long Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro On Sun, Jul 23, 2017 at 08:18:26AM -0400, Tejun Heo wrote: > cgroup_enable_threaded() checks that the cgroup doesn't have any tasks > or children and fails the operation if so. This test is unnecessary > because the first part is already checked by > cgroup_can_be_thread_root() and the latter is unnecessary. The latter > actually cause a behavioral oddity. Please consider the following > hierarchy. All cgroups are domains. > > A > / \ > B C > \ > D > > If B is made threaded, C and D becomes invalid domains. Due to the no > children restriction, threaded mode can't be enabled on C. For C and > D, the only thing the user can do is removal. > > There is no reason for this restriction. Remove it. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Cc: Waiman Long <longman@redhat.com> Applied to cgroup/for-4.14 with Waiman's ack. Thanks. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain 2017-07-22 13:43 ` Tejun Heo [not found] ` <20170722134358.GB3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> @ 2017-07-24 17:48 ` Waiman Long 1 sibling, 0 replies; 16+ messages in thread From: Waiman Long @ 2017-07-24 17:48 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro On 07/22/2017 09:43 AM, Tejun Heo wrote: > Hello, Waiman. > > On Fri, Jul 21, 2017 at 04:34:50PM -0400, Waiman Long wrote: >> When thread mode is used, it is possible that some cgroups may be >> in an invalid state. Currently users may not be aware that they are >> invalid until they try to migrate tasks over. This patch disallows >> child cgroup creation on invalid domain. This adds one more failure >> point in reminding users that they are dealing with invalid domains. >> It also minimizes the number of invalid domains outstanding as much >> as possible. > It's a bit inconsistent because we can reach the same forbidden state > by turning a sibling cgroup threaded. Please consider the following. > > A > / \ > B C > \ > D > > Let's say all are domains and we make B threaded. A becomes the > threaded domain, C and D become invalid, which is the configuration > you're trying to prevent. We can either enabling threaded on B too or > relax type modifications further so that people can make C threaded > which makes sense given that that would lead to a topology which has > to supported anyway (if C were threaded before D was created, it'd > look the same). > > So, I'm leaning more towards relaxing restrictions and tightening it, > and given that we have to expose invalid state anyway, I think there's > actual benefit in doing so as it gives more flexibility while building > the hierarchy. Yes, I totally understand that we could have this happened in a sibling node. It is just my idea to make users become more aware that they are dealing with an invalid domain cgroup. It was just a suggestion on my part and I am totally fine if it is not merged. Cheers, Longman ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control 2017-07-21 20:34 [PATCH v2 0/4] cgroup: Introducing bypass mode Waiman Long [not found] ` <1500669293-21792-1-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-07-21 20:34 ` Waiman Long [not found] ` <1500669293-21792-3-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-07-21 20:34 ` [PATCH v2 3/4] cgroup: Allow reenabling of controller in bypass mode Waiman Long 2017-07-21 20:34 ` [PATCH v2 4/4] cgroup: Make debug controller report new controller masks Waiman Long 3 siblings, 1 reply; 16+ messages in thread From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw) To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro, Waiman Long The special prefix '#' attached to a controller name can now be written into the cgroup.subtree_control file to set that controller in bypass mode in all the child cgroups. The controller will show up in the children's cgroup.controllers file, but the corresponding control knobs will be absent. However, that controller can be enabled or bypassed in its children by writing to their respective subtree_control files. This mode can be useful to non-domain controllers or controllers where there are costs to each additional layer of hierarchy. This mode will also allow more freedom in how each controller can shape its effective hierarchy independent of each others. Signed-off-by: Waiman Long <longman@redhat.com> --- Documentation/cgroup-v2.txt | 67 +++++++++++++++++----- include/linux/cgroup-defs.h | 12 ++-- kernel/cgroup/cgroup.c | 136 ++++++++++++++++++++++++++++---------------- 3 files changed, 145 insertions(+), 70 deletions(-) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 43f9811..f17a74b 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -24,8 +24,9 @@ v1 is available under Documentation/cgroup-v1/. 2-3. [Un]populated Notification 2-4. Controlling Controllers 2-4-1. Enabling and Disabling - 2-4-2. Top-down Constraint - 2-4-3. No Internal Process Constraint + 2-4-2. Cgroup Hierarchy + 2-4-3. Top-down Constraint + 2-4-4. No Internal Process Constraint 2-5. Delegation 2-5-1. Model of Delegation 2-5-2. Delegation Containment @@ -362,10 +363,15 @@ disabled by writing to the "cgroup.subtree_control" file:: # echo "+cpu +memory -io" > cgroup.subtree_control +The prefixes '+', '-' and '#' are used to enable, disable or put a +controller in the bypass mode respectively. In the bypass mode, a +controller is disabled in a cgroup, but it can be enabled again in its +child cgroups as it will still be listed in "cgroup.controllers". + Only controllers which are listed in "cgroup.controllers" can be -enabled. When multiple operations are specified as above, either they -all succeed or fail. If multiple operations on the same controller -are specified, the last one is effective. +enabled or bypassed. When multiple operations are specified as above, +either they all succeed or fail. If multiple operations on the same +controller are specified, the last one is effective. Enabling a controller in a cgroup indicates that the distribution of the target resource across its immediate children will be controlled. @@ -390,16 +396,47 @@ controller interface files - anything which doesn't start with "cgroup." are owned by the parent rather than the cgroup itself. +Cgroup Hierarchy +~~~~~~~~~~~~~~~~ + +The hierarchy as seen from each controller's perspective can be different +from that of another controller depending on which controllers are enabled +or disabled in various nodes within the full cgroup hierarchy. + +In the example below:: + + A(+) - B(#) - C(#) - D(#) - E(+) - F(-) + \ G(+) + +The '+', '-' and '#' markers in parenthesis indicate a controller in +enabled, disabled or bypass mode respectively. + +Disabling a controller in a cgroup effectively collapses it to its +parent from that controller's point of view. In this case, the +effective hierarchy as seen from that particular controller will be:: + + A|B|C|D - E|F + \ G + +There are three effective node clusters. The top cluster contains +(A, B, C, D) with two leaf clusters (E, F) and (G). + +Disabling controllers from the leaves up and using bypass mode in +the middle layers allow controllers to have their own unique views +of the cgroup hierarchy that can best fit their own need. + + Top-down Constraint ~~~~~~~~~~~~~~~~~~~ Resources are distributed top-down and a cgroup can further distribute a resource only if the resource has been distributed to it from the parent. This means that all non-root "cgroup.subtree_control" files -can only contain controllers which are enabled in the parent's -"cgroup.subtree_control" file. A controller can be enabled only if -the parent has the controller enabled and a controller can't be -disabled if one or more children have it enabled. +can only contain controllers which are enabled or bypassed in the parent's +"cgroup.subtree_control" file. A controller can be enabled or bypassed +only if the parent has the controller enabled or bypassed and the +state of a controller can't be changed if one or more children have +it enabled or bypassed. No Internal Process Constraint @@ -836,12 +873,12 @@ All cgroup core files are prefixed with "cgroup." which are enabled to control resource distribution from the cgroup to its children. - Space separated list of controllers prefixed with '+' or '-' - can be written to enable or disable controllers. A controller - name prefixed with '+' enables the controller and '-' - disables. If a controller appears more than once on the list, - the last one is effective. When multiple enable and disable - operations are specified, either all succeed or all fail. + Space separated list of controllers prefixed with '+', '-' or + '#' can be written to enable, disable or bypass controllers + respectively. If a controller appears more than once on + the list, the last one is effective. When multiple enable, + disable or bypass operations are specified, either all succeed + or all fail. cgroup.events A read-only flat-keyed file which exists on non-root cgroups. diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 9d74195..3cac6d0 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -295,16 +295,18 @@ struct cgroup { struct cgroup_file events_file; /* handle for "cgroup.events" */ /* - * The bitmask of subsystems enabled on the child cgroups. - * ->subtree_control is the one configured through - * "cgroup.subtree_control" while ->child_ss_mask is the effective - * one which may have more subsystems enabled. Controller knobs - * are made available iff it's enabled in ->subtree_control. + * The bitmask of subsystems enabled or bypassed on the child cgroups. + * ->subtree_control and ->subtree_bypass are the one configured + * through "cgroup.subtree_control" while ->subtree_ss_mask is the + * effective one which may have more subsystems enabled. Controller + * knobs are made available iff it's enabled in ->subtree_ss_mask. */ u16 subtree_control; u16 subtree_ss_mask; + u16 subtree_bypass; u16 old_subtree_control; u16 old_subtree_ss_mask; + u16 old_subtree_bypass; /* Private pointers for each registered subsystem */ struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT]; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 5fc8133..1e7feae 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -365,7 +365,8 @@ static bool cgroup_can_be_thread_root(struct cgroup *cgrp) return false; /* and no domain controllers can be enabled */ - if (cgrp->subtree_control & ~cgrp_dfl_threaded_ss_mask) + if ((cgrp->subtree_control|cgrp->subtree_bypass) & + ~cgrp_dfl_threaded_ss_mask) return false; return true; @@ -387,7 +388,8 @@ bool cgroup_is_thread_root(struct cgroup *cgrp) * enabled is a thread root. */ if (cgroup_has_tasks(cgrp) && - (cgrp->subtree_control & cgrp_dfl_threaded_ss_mask)) + ((cgrp->subtree_control|cgrp->subtree_bypass) + & cgrp_dfl_threaded_ss_mask)) return true; return false; @@ -412,7 +414,7 @@ static bool cgroup_is_valid_domain(struct cgroup *cgrp) } /* subsystems visibly enabled on a cgroup */ -static u16 cgroup_control(struct cgroup *cgrp) +static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass) { struct cgroup *parent = cgroup_parent(cgrp); u16 root_ss_mask = cgrp->root->subsys_mask; @@ -420,6 +422,9 @@ static u16 cgroup_control(struct cgroup *cgrp) if (parent) { u16 ss_mask = parent->subtree_control; + if (show_bypass) + ss_mask |= parent->subtree_bypass; + /* threaded cgroups can only have threaded controllers */ if (cgroup_is_threaded(cgrp)) ss_mask &= cgrp_dfl_threaded_ss_mask; @@ -433,13 +438,17 @@ static u16 cgroup_control(struct cgroup *cgrp) } /* subsystems enabled on a cgroup */ -static u16 cgroup_ss_mask(struct cgroup *cgrp) +static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass) { struct cgroup *parent = cgroup_parent(cgrp); if (parent) { u16 ss_mask = parent->subtree_ss_mask; + + if (show_bypass) + ss_mask |= parent->subtree_bypass; + /* threaded cgroups can only have threaded controllers */ if (cgroup_is_threaded(cgrp)) ss_mask &= cgrp_dfl_threaded_ss_mask; @@ -492,7 +501,7 @@ static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp, * This function is used while updating css associations and thus * can't test the csses directly. Test ss_mask. */ - while (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) { + while (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id))) { cgrp = cgroup_parent(cgrp); if (!cgrp) return NULL; @@ -2355,7 +2364,7 @@ int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp) return 0; /* apply no-internal-process constraint */ - if (dst_cgrp->subtree_control) + if (dst_cgrp->subtree_control|dst_cgrp->subtree_bypass) return -EBUSY; return 0; @@ -2653,15 +2662,18 @@ void cgroup_procs_write_finish(struct task_struct *task) ss->post_attach(); } -static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask) +static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask, + u16 bypass_mask) { struct cgroup_subsys *ss; bool printed = false; int ssid; - do_each_subsys_mask(ss, ssid, ss_mask) { + do_each_subsys_mask(ss, ssid, ss_mask|bypass_mask) { if (printed) seq_putc(seq, ' '); + if (!(ss_mask & (1 << ssid))) + seq_putc(seq, '#'); seq_printf(seq, "%s", ss->name); printed = true; } while_each_subsys_mask(); @@ -2673,8 +2685,10 @@ static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask) static int cgroup_controllers_show(struct seq_file *seq, void *v) { struct cgroup *cgrp = seq_css(seq)->cgroup; + struct cgroup *parent = cgroup_parent(cgrp); + u16 bypass = parent ? parent->subtree_bypass : 0; - cgroup_print_ss_mask(seq, cgroup_control(cgrp)); + cgroup_print_ss_mask(seq, cgroup_control(cgrp, false), bypass); return 0; } @@ -2683,7 +2697,7 @@ static int cgroup_subtree_control_show(struct seq_file *seq, void *v) { struct cgroup *cgrp = seq_css(seq)->cgroup; - cgroup_print_ss_mask(seq, cgrp->subtree_control); + cgroup_print_ss_mask(seq, cgrp->subtree_control, cgrp->subtree_bypass); return 0; } @@ -2796,6 +2810,7 @@ static void cgroup_save_control(struct cgroup *cgrp) cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) { dsct->old_subtree_control = dsct->subtree_control; dsct->old_subtree_ss_mask = dsct->subtree_ss_mask; + dsct->old_subtree_bypass = dsct->subtree_bypass; } } @@ -2813,10 +2828,13 @@ static void cgroup_propagate_control(struct cgroup *cgrp) struct cgroup_subsys_state *d_css; cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) { - dsct->subtree_control &= cgroup_control(dsct); + u16 mask = cgroup_control(dsct, true); + + dsct->subtree_control &= mask; + dsct->subtree_bypass &= mask; dsct->subtree_ss_mask = cgroup_calc_subtree_ss_mask(dsct->subtree_control, - cgroup_ss_mask(dsct)); + cgroup_ss_mask(dsct, true)); } } @@ -2835,6 +2853,7 @@ static void cgroup_restore_control(struct cgroup *cgrp) cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) { dsct->subtree_control = dsct->old_subtree_control; dsct->subtree_ss_mask = dsct->old_subtree_ss_mask; + dsct->subtree_bypass = dsct->old_subtree_bypass; } } @@ -2843,9 +2862,9 @@ static bool css_visible(struct cgroup_subsys_state *css) struct cgroup_subsys *ss = css->ss; struct cgroup *cgrp = css->cgroup; - if (cgroup_control(cgrp) & (1 << ss->id)) + if (cgroup_control(cgrp, false) & (1 << ss->id)) return true; - if (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) + if (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id))) return false; return cgroup_on_dfl(cgrp) && ss->implicit_on_dfl; } @@ -2876,7 +2895,7 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp) WARN_ON_ONCE(css && percpu_ref_is_dying(&css->refcnt)); - if (!(cgroup_ss_mask(dsct) & (1 << ss->id))) + if (!(cgroup_ss_mask(dsct, false) & (1 << ss->id))) continue; if (!css) { @@ -2926,7 +2945,7 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp) continue; if (css->parent && - !(cgroup_ss_mask(dsct) & (1 << ss->id))) { + !(cgroup_ss_mask(dsct, false) & (1 << ss->id))) { kill_css(css); } else if (!css_visible(css)) { css_clear_dir(css); @@ -3038,7 +3057,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { - u16 enable = 0, disable = 0; + u16 enable = 0, disable = 0, bypass = 0; + u16 child_enable = 0; struct cgroup *cgrp, *child; struct cgroup_subsys *ss; char *tok; @@ -3059,10 +3079,16 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, if (*tok == '+') { enable |= 1 << ssid; + bypass &= ~(1 << ssid); disable &= ~(1 << ssid); } else if (*tok == '-') { disable |= 1 << ssid; enable &= ~(1 << ssid); + bypass &= ~(1 << ssid); + } else if (*tok == '#') { + bypass |= 1 << ssid; + enable &= ~(1 << ssid); + disable &= ~(1 << ssid); } else { return -EINVAL; } @@ -3076,35 +3102,35 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, if (!cgrp) return -ENODEV; - for_each_subsys(ss, ssid) { - if (enable & (1 << ssid)) { - if (cgrp->subtree_control & (1 << ssid)) { - enable &= ~(1 << ssid); - continue; - } + /* + * Cannot use controllers that aren't allowed. + */ + if (~cgroup_control(cgrp, true) & (enable|disable|bypass)) { + ret = -ENOENT; + goto out_unlock; + } - if (!(cgroup_control(cgrp) & (1 << ssid))) { - ret = -ENOENT; - goto out_unlock; - } - } else if (disable & (1 << ssid)) { - if (!(cgrp->subtree_control & (1 << ssid))) { - disable &= ~(1 << ssid); - continue; - } + /* + * Strip out redundant bits. + */ + enable &= ~cgrp->subtree_control; + bypass &= ~cgrp->subtree_bypass; + disable &= (cgrp->subtree_control|cgrp->subtree_bypass); - /* a child has it enabled? */ - cgroup_for_each_live_child(child, cgrp) { - if (child->subtree_control & (1 << ssid)) { - ret = -EBUSY; - goto out_unlock; - } - } - } + if (!(enable|bypass|disable)) { + ret = 0; + goto out_unlock; } - if (!enable && !disable) { - ret = 0; + + cgroup_for_each_live_child(child, cgrp) + child_enable |= child->subtree_control|child->subtree_bypass; + + /* + * Cannot change the state of a controller if enabled in children. + */ + if ((enable|bypass|disable) & child_enable) { + ret = -EBUSY; goto out_unlock; } @@ -3116,7 +3142,9 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, cgroup_save_control(cgrp); cgrp->subtree_control |= enable; - cgrp->subtree_control &= ~disable; + cgrp->subtree_control &= ~(bypass|disable); + cgrp->subtree_bypass |= bypass; + cgrp->subtree_bypass &= ~(enable|disable); ret = cgroup_apply_control(cgrp); @@ -4441,7 +4469,8 @@ static void css_release(struct percpu_ref *ref) } static void init_and_link_css(struct cgroup_subsys_state *css, - struct cgroup_subsys *ss, struct cgroup *cgrp) + struct cgroup_subsys *ss, struct cgroup *cgrp, + struct cgroup_subsys_state *parent_css) { lockdep_assert_held(&cgroup_mutex); @@ -4456,8 +4485,8 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css->serial_nr = css_serial_nr_next++; atomic_set(&css->online_cnt, 0); - if (cgroup_parent(cgrp)) { - css->parent = cgroup_css(cgroup_parent(cgrp), ss); + if (parent_css) { + css->parent = parent_css; css_get(css->parent); } @@ -4520,19 +4549,26 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, struct cgroup_subsys *ss) { struct cgroup *parent = cgroup_parent(cgrp); - struct cgroup_subsys_state *parent_css = cgroup_css(parent, ss); + struct cgroup_subsys_state *parent_css = NULL; struct cgroup_subsys_state *css; int err; lockdep_assert_held(&cgroup_mutex); + /* + * As cgroup may be in bypass mode, need to skip over ancestor + * cgroups with NULL CSS. + */ + for (; parent && !parent_css; parent = cgroup_parent(parent)) + parent_css = cgroup_css(parent, ss); + css = ss->css_alloc(parent_css); if (!css) css = ERR_PTR(-ENOMEM); if (IS_ERR(css)) return css; - init_and_link_css(css, ss, cgrp); + init_and_link_css(css, ss, cgrp, parent_css); err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL); if (err) @@ -4634,7 +4670,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent) * subtree_control from the parent. Each is configured manually. */ if (!cgroup_on_dfl(cgrp)) - cgrp->subtree_control = cgroup_control(cgrp); + cgrp->subtree_control = cgroup_control(cgrp, false); if (parent) cgroup_bpf_inherit(cgrp, parent); @@ -4921,7 +4957,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) css = ss->css_alloc(cgroup_css(&cgrp_dfl_root.cgrp, ss)); /* We don't handle early failures gracefully */ BUG_ON(IS_ERR(css)); - init_and_link_css(css, ss, &cgrp_dfl_root.cgrp); + init_and_link_css(css, ss, &cgrp_dfl_root.cgrp, NULL); /* * Root csses are never destroyed and we can't initialize -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
[parent not found: <1500669293-21792-3-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control [not found] ` <1500669293-21792-3-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-07-22 13:50 ` Tejun Heo [not found] ` <20170722135030.GC3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Tejun Heo @ 2017-07-22 13:50 UTC (permalink / raw) To: Waiman Long Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg, pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ, efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, guro-b10kYP2dOMg Hello, Waiman. On Fri, Jul 21, 2017 at 04:34:51PM -0400, Waiman Long wrote: > The special prefix '#' attached to a controller name can now be written > into the cgroup.subtree_control file to set that controller in bypass > mode in all the child cgroups. The controller will show up in the > children's cgroup.controllers file, but the corresponding control knobs > will be absent. However, that controller can be enabled or bypassed > in its children by writing to their respective subtree_control files. > > This mode can be useful to non-domain controllers or controllers where > there are costs to each additional layer of hierarchy. This mode will > also allow more freedom in how each controller can shape its effective > hierarchy independent of each others. While this continues to be an interesting idea. I'm still having a bit of hard time with the change. The biggest blocks are * As raised a couple times before, how would this work in terms of resource ownership and delegation? The last time we spoke about this, I felt that we were mostly talking past each other. I think it'd really help to think about / explain how this would work with delegation to clarify who owns what. * While the idea is interesting, I think we need more concrete usecases to justify the addition and make sure that we aren't doing something misguided. Can you please illustrate / give examples of how this would be useful? Thanks. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20170722135030.GC3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control [not found] ` <20170722135030.GC3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> @ 2017-07-24 18:20 ` Waiman Long 2017-07-25 17:13 ` Tejun Heo 0 siblings, 1 reply; 16+ messages in thread From: Waiman Long @ 2017-07-24 18:20 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg, pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ, efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, guro-b10kYP2dOMg On 07/22/2017 09:50 AM, Tejun Heo wrote: > Hello, Waiman. > > On Fri, Jul 21, 2017 at 04:34:51PM -0400, Waiman Long wrote: >> The special prefix '#' attached to a controller name can now be written >> into the cgroup.subtree_control file to set that controller in bypass >> mode in all the child cgroups. The controller will show up in the >> children's cgroup.controllers file, but the corresponding control knobs >> will be absent. However, that controller can be enabled or bypassed >> in its children by writing to their respective subtree_control files. >> >> This mode can be useful to non-domain controllers or controllers where >> there are costs to each additional layer of hierarchy. This mode will >> also allow more freedom in how each controller can shape its effective >> hierarchy independent of each others. > While this continues to be an interesting idea. I'm still having a > bit of hard time with the change. The biggest blocks are > > * As raised a couple times before, how would this work in terms of > resource ownership and delegation? The last time we spoke about > this, I felt that we were mostly talking past each other. I think > it'd really help to think about / explain how this would work with > delegation to clarify who owns what. As said in patch 3, enabling bypass mode at subtree_control delegate the authority of enabling controllers to the children. The children own the resource control files directly. It will be more straight forward to explain if bypass mode can only be used consistently from the root down. Having a mix of regular enable and bypass down the tree will be more tricky to talk about. > * While the idea is interesting, I think we need more concrete > usecases to justify the addition and make sure that we aren't doing > something misguided. Can you please illustrate / give examples of > how this would be useful? Bypass mode targets mainly non-domain controllers and controllers that have cost associated with each additional level of hierarchy (e.g. cpu). I believe the end goal of cgroup v2 is to have all controllers migrated to it eventually. Consider the following: A / \ B C / \ / \ D E F G Controller X may want (A, B, C) to be controlled as one group with one set of control files whereas D, E, F, G will have their own control files. Controller Y may want all of them have their own control files. Bypass mode allows us to do that. With more and more controllers enabled in v2, the chance of this kind of configuration conflicts is going up. I am willing to take a more limited form of bypass mode that have to be either enabled (+) or bypass (#) only from the root down for the time being and then consider allowing their mixing later on if you think it is more acceptable to you. Cheers, Longman ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control 2017-07-24 18:20 ` Waiman Long @ 2017-07-25 17:13 ` Tejun Heo [not found] ` <20170725171343.GB3216015-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> 2017-08-01 14:29 ` Waiman Long 0 siblings, 2 replies; 16+ messages in thread From: Tejun Heo @ 2017-07-25 17:13 UTC (permalink / raw) To: Waiman Long Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro Hello, Waiman. On Mon, Jul 24, 2017 at 02:20:59PM -0400, Waiman Long wrote: > As said in patch 3, enabling bypass mode at subtree_control delegate the > authority of enabling controllers to the children. The children own the > resource control files directly. It will be more straight forward to But that doesn't work at all because such child would end up controlling the distribution of an ancestor's resources. It breaks a fundamental property of the hierarchy. > explain if bypass mode can only be used consistently from the root down. > Having a mix of regular enable and bypass down the tree will be more > tricky to talk about. Hmmm... it isn't just being tricky. As proposed, it is in direct conflict with the basic semantics of the resource hierarchy. > > * While the idea is interesting, I think we need more concrete > > usecases to justify the addition and make sure that we aren't doing > > something misguided. Can you please illustrate / give examples of > > how this would be useful? > > Bypass mode targets mainly non-domain controllers and controllers that > have cost associated with each additional level of hierarchy (e.g. cpu). > I believe the end goal of cgroup v2 is to have all controllers migrated > to it eventually. Consider the following: > > A > / \ > B C > / \ / \ > D E F G > > Controller X may want (A, B, C) to be controlled as one group with one > set of control files whereas D, E, F, G will have their own control > files. Controller Y may want all of them have their own control files. > Bypass mode allows us to do that. With more and more controllers enabled > in v2, the chance of this kind of configuration conflicts is going up. I think I understand what it wants to do but I think it's still lacking justfications given how invasive the change is to the basic operation and usage. We need more than one can think of this and it can help with certain hypothetical use cases. ie. along the line of what the actual use cases are, what our overhead looks like and why, and why the problem can't be solved in a different, hopefully less intrusive, way. Thanks. -- tejun ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20170725171343.GB3216015-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control [not found] ` <20170725171343.GB3216015-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> @ 2017-07-25 19:10 ` Waiman Long 0 siblings, 0 replies; 16+ messages in thread From: Waiman Long @ 2017-07-25 19:10 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg, pjt-hpIqsD4AKlfQT0dZR+AlfA, luto-kltTT9wpgjJwATOyAt5JVQ, efault-Mmb7MZpHnFY, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, guro-b10kYP2dOMg On 07/25/2017 01:13 PM, Tejun Heo wrote: > Hello, Waiman. > > On Mon, Jul 24, 2017 at 02:20:59PM -0400, Waiman Long wrote: >> As said in patch 3, enabling bypass mode at subtree_control delegate the >> authority of enabling controllers to the children. The children own the >> resource control files directly. It will be more straight forward to > But that doesn't work at all because such child would end up > controlling the distribution of an ancestor's resources. It breaks a > fundamental property of the hierarchy. > >> explain if bypass mode can only be used consistently from the root down. >> Having a mix of regular enable and bypass down the tree will be more >> tricky to talk about. > Hmmm... it isn't just being tricky. As proposed, it is in direct > conflict with the basic semantics of the resource hierarchy. > >>> * While the idea is interesting, I think we need more concrete >>> usecases to justify the addition and make sure that we aren't doing >>> something misguided. Can you please illustrate / give examples of >>> how this would be useful? >> Bypass mode targets mainly non-domain controllers and controllers that >> have cost associated with each additional level of hierarchy (e.g. cpu). >> I believe the end goal of cgroup v2 is to have all controllers migrated >> to it eventually. Consider the following: >> >> A >> / \ >> B C >> / \ / \ >> D E F G >> >> Controller X may want (A, B, C) to be controlled as one group with one >> set of control files whereas D, E, F, G will have their own control >> files. Controller Y may want all of them have their own control files. >> Bypass mode allows us to do that. With more and more controllers enabled >> in v2, the chance of this kind of configuration conflicts is going up. > I think I understand what it wants to do but I think it's still > lacking justfications given how invasive the change is to the basic > operation and usage. We need more than one can think of this and it > can help with certain hypothetical use cases. ie. along the line of > what the actual use cases are, what our overhead looks like and why, > and why the problem can't be solved in a different, hopefully less > intrusive, way. As I said above that bypass mode can be useful for non-domain controllers. For example, controllers like net_cls, net_prio just provides an ID for classification. There is no resource for the parent to control or distribute. We can, of course, make them implicit like perf_event and activate them in all the cgroups. Alternatively, we can use bypass and what let whatever cgroups that need it activate one for themselves to avoid proliferation of unused IDs. Another use case is the cpu controller. As discussed a while before, scheduler intensive workload wills suffer with each additional level of hierarchy even if nothing is running at the same time. Image the following hierarchy: R / \ A B / \ / \ X Y W Z Supoose that memory consumption is the bottleneck and we use memory controller to distribute memory resources to tasks in X, Y, W, Z. Also suppose we have sufficient CPU resources available that we don't care much about how much CPU they uses. Now, if tasks in cgroup X want to use cpu controller to restrict the amount of CPU available to a subgroup of tasks. Currently, the only way to do that is to have cpu controller enabled all the way down from the root. Now all the tasks in Y, W & Z will have to suffer the additional performance overhead of this extra levels of cpu controller hierarchy. We also need to figure out how much CPU resource will need to be partitioned among cgroups. With bypass mode, you only need to activate the CPU controllers where it is needed. The tasks in the other cgroups can just compete directly with each other without worrying about resource partition and hierarchical overhead. This is one example of the conflict of hierarchy problem I mentioned before. Even though I wrote that the children own the control files in their cgroup in bypass mode, it is mainly a conceptual framework for discussion purpose from my perspective. In reality, it is a matter of who has the permission to write to cgroup.controller to re-enable a bypassed controller and the corresponding controller files. So we can define it either ways (by parent or by child) to best fit the narrative that we want to convey to the cgroup users. I don't care much about the narrative. I care more about what capability and flexibility that can be made available to the cgroup users. In your nsdelegate mount option patch, only cgroup.procs and cgroup.subtree_control are to be written by delegatees. So unless we extend it to other control files, those other files are still practically owned by the parent. Cheers, Longman ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control 2017-07-25 17:13 ` Tejun Heo [not found] ` <20170725171343.GB3216015-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> @ 2017-08-01 14:29 ` Waiman Long 1 sibling, 0 replies; 16+ messages in thread From: Waiman Long @ 2017-08-01 14:29 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro On 07/25/2017 01:13 PM, Tejun Heo wrote: >> >> Bypass mode targets mainly non-domain controllers and controllers that >> have cost associated with each additional level of hierarchy (e.g. cpu). >> I believe the end goal of cgroup v2 is to have all controllers migrated >> to it eventually. Consider the following: >> >> A >> / \ >> B C >> / \ / \ >> D E F G >> >> Controller X may want (A, B, C) to be controlled as one group with one >> set of control files whereas D, E, F, G will have their own control >> files. Controller Y may want all of them have their own control files. >> Bypass mode allows us to do that. With more and more controllers enabled >> in v2, the chance of this kind of configuration conflicts is going up. > I think I understand what it wants to do but I think it's still > lacking justfications given how invasive the change is to the basic > operation and usage. We need more than one can think of this and it I don't think it is as invasive as you have suggested. The latest patch doesn't change the semantics of the cgroup control operation as long as the bypass mode isn't used in subtree_control. A major feature enabled by this mode is that instead of enabling a controller in all its children, the bypass mode allows us to choose which children will have the controller enabled. As for the ownership question, I am going to change the description so that control files in the children are still going to be owned by the parent to minimize changes to basic operation. > can help with certain hypothetical use cases. ie. along the line of > what the actual use cases are, what our overhead looks like and why, > and why the problem can't be solved in a different, hopefully less > intrusive, way. A common use case is an application that want to use cpuset, for example, to bind some worker threads to individual cpus. At the same time, the application may also want to use cpu controller to limit the amount of cpu consumed by some other threads. Right now, the only way to do that with the current v2 control scheme is to create child cgroups with both cpu and cpuset controllers enabled and put the desired processes or threads into those child cgroups. The cost of enabling cpuset on a task that need cpu controller is negligible. However, the cost of enabling cpu controller on tasks that only need cpuset can be noticeable. The performance difference may become a concern for users to move from cgroup v1 to v2. If our goal is to retire cgroup v1 eventually, we need to make cgroup v2 as performant as cgroup v1. The cpu controller also have a higher memory cost than other cgroup controller due to the fact that it requires percpu runqueue structure that is pretty big especially on system with many cpus. As a result, more memory will be wasted on tasks that only need cpuset. With bypass mode, you can put tasks that need cpu controller into a child cgroups with only cpu controller enabled and put tasks that need cpuset into child cgroups with only cpuset enabled, similar to what users can now do on cgroup v1. I think there is enough justification to have the bypass mode feature in cgroup v2. I would like to know what other concern do you have with this feature. Cheers, Longman ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 3/4] cgroup: Allow reenabling of controller in bypass mode 2017-07-21 20:34 [PATCH v2 0/4] cgroup: Introducing bypass mode Waiman Long [not found] ` <1500669293-21792-1-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-07-21 20:34 ` [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control Waiman Long @ 2017-07-21 20:34 ` Waiman Long 2017-07-21 20:34 ` [PATCH v2 4/4] cgroup: Make debug controller report new controller masks Waiman Long 3 siblings, 0 replies; 16+ messages in thread From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw) To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro, Waiman Long Controllers set to bypass mode in the parent's "cgroup.subtree_control" can now be optionally enabled by writing the controller name with the '+' prefix to "cgroup.controllers". Using the '#' prefix will reset it back to the bypass state. This capability increases the flexibility each controller has in shaping the effective cgroup hierarchy to best suit its need. Signed-off-by: Waiman Long <longman@redhat.com> --- Documentation/cgroup-v2.txt | 23 +++++++++- include/linux/cgroup-defs.h | 7 +++ kernel/cgroup/cgroup.c | 109 ++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 134 insertions(+), 5 deletions(-) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index f17a74b..efb69c4 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -395,6 +395,18 @@ prefixed controller interface files from C and D. This means that the controller interface files - anything which doesn't start with "cgroup." are owned by the parent rather than the cgroup itself. +Once a controller is put into bypass mode in "cgroup.subtree_control", +the cgroup's children can optionally enable this controller by writing +the controller name with the '+ prefix into "cgroup.controllers". +In this case, the controller interface files are considered to be +owned by the child cgroup itself, not by its parent. Therefore, +setting the bypass mode in "cgroup.subtree_control" means delegating +the authority of enabling or disabling the controller interface files +to its children. Writing the controller name with the '#' prefix into +"cgroup.controllers" resets the state back to bypass mode. The state +of a controller cannot be changed if it is enabled or bypassed in its +"cgroup.subtree_control". + Cgroup Hierarchy ~~~~~~~~~~~~~~~~ @@ -859,11 +871,18 @@ All cgroup core files are prefixed with "cgroup." should be granted along with the containing directory. cgroup.controllers - A read-only space separated values file which exists on all + A read-write space separated values file which exists on all cgroups. It shows space separated list of all controllers available to - the cgroup. The controllers are not ordered. + the cgroup. Controller names with '#' prefix are in bypass + mode. The controllers are not ordered. + + When a controller is set into bypass mode in its parent's + "cgroup.subtree_control", its name prefixed with '+' or '#' + can be written to enable it or reset it back to bypass mode + respectively. Controllers not in bypass mode are not allowed + to be written. cgroup.subtree_control A read-write space separated values file which exists on all diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 3cac6d0..25c2ac8 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -308,6 +308,13 @@ struct cgroup { u16 old_subtree_ss_mask; u16 old_subtree_bypass; + /* + * The bitmask of subsystems that are set in its parent's + * ->subtree_bypass and explictly enabled in this cgroup. + */ + u16 enable_ss_mask; + u16 old_enable_ss_mask; + /* Private pointers for each registered subsystem */ struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT]; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1e7feae..358d8b3 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -420,7 +420,7 @@ static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass) u16 root_ss_mask = cgrp->root->subsys_mask; if (parent) { - u16 ss_mask = parent->subtree_control; + u16 ss_mask = parent->subtree_control|cgrp->enable_ss_mask; if (show_bypass) ss_mask |= parent->subtree_bypass; @@ -443,7 +443,7 @@ static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass) struct cgroup *parent = cgroup_parent(cgrp); if (parent) { - u16 ss_mask = parent->subtree_ss_mask; + u16 ss_mask = parent->subtree_ss_mask|cgrp->enable_ss_mask; if (show_bypass) @@ -2811,6 +2811,7 @@ static void cgroup_save_control(struct cgroup *cgrp) dsct->old_subtree_control = dsct->subtree_control; dsct->old_subtree_ss_mask = dsct->subtree_ss_mask; dsct->old_subtree_bypass = dsct->subtree_bypass; + dsct->old_enable_ss_mask = dsct->enable_ss_mask; } } @@ -2854,6 +2855,7 @@ static void cgroup_restore_control(struct cgroup *cgrp) dsct->subtree_control = dsct->old_subtree_control; dsct->subtree_ss_mask = dsct->old_subtree_ss_mask; dsct->subtree_bypass = dsct->old_subtree_bypass; + dsct->enable_ss_mask = dsct->old_enable_ss_mask; } } @@ -3124,7 +3126,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, cgroup_for_each_live_child(child, cgrp) - child_enable |= child->subtree_control|child->subtree_bypass; + child_enable |= child->subtree_control|child->subtree_bypass| + child->enable_ss_mask; /* * Cannot change the state of a controller if enabled in children. @@ -3157,6 +3160,105 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of, return ret ?: nbytes; } +/* + * Change bypass status of controllers for a cgroup in the default hierarchy. + */ +static ssize_t cgroup_controllers_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + u16 enable = 0, bypass = 0; + struct cgroup *cgrp, *parent; + struct cgroup_subsys *ss; + char *tok; + int ssid, ret; + + /* + * Parse input - space separated list of subsystem names prefixed + * with either + or #. + */ + buf = strstrip(buf); + while ((tok = strsep(&buf, " "))) { + if (tok[0] == '\0') + continue; + do_each_subsys_mask(ss, ssid, ~cgrp_dfl_inhibit_ss_mask) { + if (!cgroup_ssid_enabled(ssid) || + strcmp(tok + 1, ss->name)) + continue; + + if (*tok == '+') { + enable |= 1 << ssid; + bypass &= ~(1 << ssid); + } else if (*tok == '#') { + bypass |= 1 << ssid; + enable &= ~(1 << ssid); + } else { + return -EINVAL; + } + break; + } while_each_subsys_mask(); + if (ssid == CGROUP_SUBSYS_COUNT) + return -EINVAL; + } + + cgrp = cgroup_kn_lock_live(of->kn, true); + if (!cgrp) + return -ENODEV; + + /* + * Write to root cgroup's controllers file is not allowed. + */ + parent = cgroup_parent(cgrp); + if (!parent) { + ret = -EINVAL; + goto out_unlock; + } + + /* + * Only controllers set into bypass mode in the parent cgroup + * can be specified here. + */ + if (~parent->subtree_bypass & (enable|bypass)) { + ret = -ENOENT; + goto out_unlock; + } + + /* + * Mask off irrelevant bits. + */ + enable &= ~cgrp->enable_ss_mask; + bypass &= cgrp->enable_ss_mask; + + if (!(enable|bypass)) { + ret = 0; + goto out_unlock; + } + + /* + * We cannot change the bypass state of a controller that is enabled + * in subtree_control. + */ + if ((cgrp->subtree_control|cgrp->subtree_bypass) & (enable|bypass)) { + ret = -EBUSY; + goto out_unlock; + } + + /* Save and update control masks and prepare csses */ + cgroup_save_control(cgrp); + + cgrp->enable_ss_mask |= enable; + cgrp->enable_ss_mask &= ~bypass; + + ret = cgroup_apply_control(cgrp); + cgroup_finalize_control(cgrp, ret); + kernfs_activate(cgrp->kn); + ret = 0; + +out_unlock: + cgroup_kn_unlock(of->kn); + return ret ?: nbytes; +} + static int cgroup_enable_threaded(struct cgroup *cgrp) { struct cgroup *parent = cgroup_parent(cgrp); @@ -4322,6 +4424,7 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of, { .name = "cgroup.controllers", .seq_show = cgroup_controllers_show, + .write = cgroup_controllers_write, }, { .name = "cgroup.subtree_control", -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 4/4] cgroup: Make debug controller report new controller masks 2017-07-21 20:34 [PATCH v2 0/4] cgroup: Introducing bypass mode Waiman Long ` (2 preceding siblings ...) 2017-07-21 20:34 ` [PATCH v2 3/4] cgroup: Allow reenabling of controller in bypass mode Waiman Long @ 2017-07-21 20:34 ` Waiman Long 3 siblings, 0 replies; 16+ messages in thread From: Waiman Long @ 2017-07-21 20:34 UTC (permalink / raw) To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar Cc: cgroups, linux-kernel, kernel-team, pjt, luto, efault, torvalds, guro, Waiman Long The newly added cgroup controller masks (subtree_bypass and enable_ss_mask) are now being reported in the debug.masks controller file. Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/cgroup/debug.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c index f661b4c..5f35a76 100644 --- a/kernel/cgroup/debug.c +++ b/kernel/cgroup/debug.c @@ -262,6 +262,8 @@ static int cgroup_masks_read(struct seq_file *seq, void *v) cgroup_masks_read_one(seq, "subtree_control", cgrp->subtree_control); cgroup_masks_read_one(seq, "subtree_ss_mask", cgrp->subtree_ss_mask); + cgroup_masks_read_one(seq, "subtree_bypass", cgrp->subtree_bypass); + cgroup_masks_read_one(seq, "enable_ss_mask", cgrp->enable_ss_mask); cgroup_kn_unlock(of->kn); return 0; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-08-01 14:29 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-21 20:34 [PATCH v2 0/4] cgroup: Introducing bypass mode Waiman Long
[not found] ` <1500669293-21792-1-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-21 20:34 ` [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain Waiman Long
2017-07-22 13:43 ` Tejun Heo
[not found] ` <20170722134358.GB3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2017-07-23 12:18 ` [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode Tejun Heo
[not found] ` <20170723121826.GC1498614-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2017-07-24 19:14 ` Waiman Long
[not found] ` <c2637f47-3e90-a322-f84b-0c913da14977-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-25 17:22 ` [PATCH cgroup/for-4.14] cgroup: add comment to cgroup_enable_threaded() Tejun Heo
2017-07-25 17:16 ` [PATCH] cgroup: remove unnecessary empty check when enabling threaded mode Tejun Heo
2017-07-24 17:48 ` [PATCH v2 1/4] cgroup: Child cgroup creation not allowed on invalid domain Waiman Long
2017-07-21 20:34 ` [PATCH v2 2/4] cgroup: Allow bypass mode in subtree_control Waiman Long
[not found] ` <1500669293-21792-3-git-send-email-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-07-22 13:50 ` Tejun Heo
[not found] ` <20170722135030.GC3329631-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2017-07-24 18:20 ` Waiman Long
2017-07-25 17:13 ` Tejun Heo
[not found] ` <20170725171343.GB3216015-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2017-07-25 19:10 ` Waiman Long
2017-08-01 14:29 ` Waiman Long
2017-07-21 20:34 ` [PATCH v2 3/4] cgroup: Allow reenabling of controller in bypass mode Waiman Long
2017-07-21 20:34 ` [PATCH v2 4/4] cgroup: Make debug controller report new controller masks Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).