* [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
@ 2013-06-14 3:47 Tejun Heo
[not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2013-06-14 3:47 UTC (permalink / raw)
To: Li Zefan
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Lennart Poettering
cgroup's rename(2) isn't a proper migration implementation - it can't
move the cgroup to a different parent in the hierarchy. All it can do
is swapping the name string for that cgroup. This isn't useful and
can mislead users to think that cgroup supports proper cgroup-level
migration. Disallow rename(2) if sane_behavior.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 7 +++++++
2 files changed, 9 insertions(+)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 1760476..f975227 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -270,6 +270,8 @@ enum {
* - "release_agent" and "notify_on_release" are removed.
* Replacement notification mechanism will be implemented.
*
+ * - rename(2) is disallowed.
+ *
* - memcg: use_hierarchy is on by default and the cgroup file for
* the flag is not created.
*/
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2e9da7b..7d78902 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2508,6 +2508,13 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry,
cgrp = __d_cgrp(old_dentry);
+ /*
+ * This isn't a proper migration and its usefulness is very
+ * limited. Disallow if sane_behavior.
+ */
+ if (cgroup_sane_behavior(cgrp))
+ return -EINVAL;
+
name = cgroup_alloc_name(new_dentry);
if (!name)
return -ENOMEM;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2013-06-14 7:44 ` Li Zefan
2013-06-14 18:18 ` [PATCH v2 " Tejun Heo
2013-06-16 7:16 ` [PATCH " Lennart Poettering
2 siblings, 0 replies; 14+ messages in thread
From: Li Zefan @ 2013-06-14 7:44 UTC (permalink / raw)
To: Tejun Heo
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Lennart Poettering
On 2013/6/14 11:47, Tejun Heo wrote:
> cgroup's rename(2) isn't a proper migration implementation - it can't
> move the cgroup to a different parent in the hierarchy. All it can do
> is swapping the name string for that cgroup. This isn't useful and
> can mislead users to think that cgroup supports proper cgroup-level
> migration. Disallow rename(2) if sane_behavior.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> include/linux/cgroup.h | 2 ++
> kernel/cgroup.c | 7 +++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 1760476..f975227 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -270,6 +270,8 @@ enum {
> * - "release_agent" and "notify_on_release" are removed.
> * Replacement notification mechanism will be implemented.
> *
> + * - rename(2) is disallowed.
> + *
> * - memcg: use_hierarchy is on by default and the cgroup file for
> * the flag is not created.
> */
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 2e9da7b..7d78902 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2508,6 +2508,13 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry,
>
> cgrp = __d_cgrp(old_dentry);
>
> + /*
> + * This isn't a proper migration and its usefulness is very
> + * limited. Disallow if sane_behavior.
> + */
> + if (cgroup_sane_behavior(cgrp))
> + return -EINVAL;
> +
Isn't -EPERM a better errno? VFS will return -EPERM if inode->rename is NULL.
> name = cgroup_alloc_name(new_dentry);
> if (!name)
> return -ENOMEM;
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-14 7:44 ` Li Zefan
@ 2013-06-14 18:18 ` Tejun Heo
[not found] ` <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-16 7:16 ` [PATCH " Lennart Poettering
2 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2013-06-14 18:18 UTC (permalink / raw)
To: Li Zefan
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Lennart Poettering
cgroup's rename(2) isn't a proper migration implementation - it can't
move the cgroup to a different parent in the hierarchy. All it can do
is swapping the name string for that cgroup. This isn't useful and
can mislead users to think that cgroup supports proper cgroup-level
migration. Disallow rename(2) if sane_behavior.
v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs
return value when ->rename is not implemented as suggested by Li.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 7 +++++++
2 files changed, 9 insertions(+)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 1760476..f975227 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -270,6 +270,8 @@ enum {
* - "release_agent" and "notify_on_release" are removed.
* Replacement notification mechanism will be implemented.
*
+ * - rename(2) is disallowed.
+ *
* - memcg: use_hierarchy is on by default and the cgroup file for
* the flag is not created.
*/
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2e9da7b..c2c64005 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2508,6 +2508,13 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry,
cgrp = __d_cgrp(old_dentry);
+ /*
+ * This isn't a proper migration and its usefulness is very
+ * limited. Disallow if sane_behavior.
+ */
+ if (cgroup_sane_behavior(cgrp))
+ return -EPERM;
+
name = cgroup_alloc_name(new_dentry);
if (!name)
return -ENOMEM;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-14 7:44 ` Li Zefan
2013-06-14 18:18 ` [PATCH v2 " Tejun Heo
@ 2013-06-16 7:16 ` Lennart Poettering
[not found] ` <20130616071648.GA1978-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
2 siblings, 1 reply; 14+ messages in thread
From: Lennart Poettering @ 2013-06-16 7:16 UTC (permalink / raw)
To: Tejun Heo
Cc: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers
On Thu, 13.06.13 20:47, Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org) wrote:
> cgroup's rename(2) isn't a proper migration implementation - it can't
> move the cgroup to a different parent in the hierarchy. All it can do
> is swapping the name string for that cgroup. This isn't useful and
> can mislead users to think that cgroup supports proper cgroup-level
> migration. Disallow rename(2) if sane_behavior.
Note that in the long run I'd really like to see functionality added to
move a full cgroup subtree to a different point in the tree, so that we
can migrate containers and such atomically from one partition to
another, without the container having to be aware of this (which of
course implies that we have a somewhat more useful cgroup namespacing
solution in place).
Lennart
--
Lennart Poettering - Red Hat, Inc.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130616071648.GA1978-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
@ 2013-06-16 22:15 ` Tejun Heo
[not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2013-06-16 22:15 UTC (permalink / raw)
To: Lennart Poettering
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Michal Hocko, Johannes Weiner,
cgroups-u79uwXL29TY76Z2rM5mHXA
Hey, Lennart.
On Sun, Jun 16, 2013 at 09:16:48AM +0200, Lennart Poettering wrote:
> Note that in the long run I'd really like to see functionality added to
> move a full cgroup subtree to a different point in the tree, so that we
> can migrate containers and such atomically from one partition to
As some configurations are still affected by who one's parent is, we
can't do migration using rename(2) right now. Some configurations
which are legitimate under the current parent might be invalid when
put under a different parent. For migration to work, all
configurations should be composable, which currently isn't true. Off
the top of my head, I think the offending ones are cpuset and
device_cgroup, and I *think* the scheme Michal proposed at LSF breaks
composability.
In general, configuration composability is something we want. We
don't want descendant configs being updated or becoming invalid as
configurations of an ancestor change - any configuration should be
valid. The effects of some configurations could be no-op depending on
how the ancestors are configured, but that's just the configuration
being interpreted within the limits of its ancestors.
Li is working to make cpuset composable. As for device_cgroup, I'm
not sure what to do. Aristeu, I know you already put in too much work
into devcg and I'm sorry that I'm raising this now but would you be
interested in tackling this too?
We want composability regardless of migration, but as for migration
itself, an alternative could be having an atomic "move everything in
cgroup A to cgroup B" interface, so that the admin (whether human or
base system software) can set up a new cgroup and then move the member
tasks atomically.
> another, without the container having to be aware of this (which of
> course implies that we have a somewhat more useful cgroup namespacing
> solution in place).
But such approach would definitely be visible to containers. :(
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2013-06-17 13:00 ` Aristeu Rozanski
[not found] ` <20130617130029.GB3212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-17 13:51 ` Michal Hocko
2013-06-17 16:33 ` Lennart Poettering
2 siblings, 1 reply; 14+ messages in thread
From: Aristeu Rozanski @ 2013-06-17 13:00 UTC (permalink / raw)
To: Tejun Heo
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Michal Hocko, Lennart Poettering, Johannes Weiner,
cgroups-u79uwXL29TY76Z2rM5mHXA
On Sun, Jun 16, 2013 at 03:15:56PM -0700, Tejun Heo wrote:
> Li is working to make cpuset composable. As for device_cgroup, I'm
> not sure what to do. Aristeu, I know you already put in too much work
> into devcg and I'm sorry that I'm raising this now but would you be
> interested in tackling this too?
sure, I'm on it. that sounds like one of the versions of the hierarchy
patchset - having the local configuration saved but only applying what's
possible based on parent's state.
--
Aristeu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-17 13:00 ` Aristeu Rozanski
@ 2013-06-17 13:51 ` Michal Hocko
[not found] ` <20130617135122.GD5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-17 16:33 ` Lennart Poettering
2 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2013-06-17 13:51 UTC (permalink / raw)
To: Tejun Heo
Cc: Lennart Poettering, Li Zefan,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers,
Johannes Weiner, Aristeu Rozanski
On Sun 16-06-13 15:15:56, Tejun Heo wrote:
> Hey, Lennart.
>
> On Sun, Jun 16, 2013 at 09:16:48AM +0200, Lennart Poettering wrote:
> > Note that in the long run I'd really like to see functionality added to
> > move a full cgroup subtree to a different point in the tree, so that we
> > can migrate containers and such atomically from one partition to
>
> As some configurations are still affected by who one's parent is, we
> can't do migration using rename(2) right now.
> Some configurations which are legitimate under the current parent
> might be invalid when put under a different parent.
yes, for example all configurations where old parent is more restrictive
than the new one. For example. hardlimit in memcg or even more
oom_control, swappiness or use_hierarchy which are expected to be
consistent down the hierarchy.
> For migration to
> work, all configurations should be composable, which currently isn't
> true. Off the top of my head, I think the offending ones are cpuset
> and device_cgroup, and I *think* the scheme Michal proposed at LSF
> breaks composability.
The biggest problem I can see is how the core cgroup code know when it is
OK to migrate. There might be some ongoing operations that depend on the
current tree structure. For example the hierarchical reclaim or oom
etc..
I do not think that soft reclaim which I was talking about at LSF would
change anything here as it would be pretty much the same as the hard
limit. But that is not so important.
> In general, configuration composability is something we want. We
> don't want descendant configs being updated or becoming invalid as
> configurations of an ancestor change - any configuration should be
> valid. The effects of some configurations could be no-op depending on
> how the ancestors are configured, but that's just the configuration
> being interpreted within the limits of its ancestors.
>
> Li is working to make cpuset composable. As for device_cgroup, I'm
> not sure what to do. Aristeu, I know you already put in too much work
> into devcg and I'm sorry that I'm raising this now but would you be
> interested in tackling this too?
>
> We want composability regardless of migration, but as for migration
> itself, an alternative could be having an atomic "move everything in
> cgroup A to cgroup B" interface, so that the admin (whether human or
> base system software) can set up a new cgroup and then move the member
> tasks atomically.
>
> > another, without the container having to be aware of this (which of
> > course implies that we have a somewhat more useful cgroup namespacing
> > solution in place).
>
> But such approach would definitely be visible to containers. :(
>
> Thanks.
>
> --
> tejun
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-17 13:00 ` Aristeu Rozanski
2013-06-17 13:51 ` Michal Hocko
@ 2013-06-17 16:33 ` Lennart Poettering
[not found] ` <20130617163353.GD22846-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
2 siblings, 1 reply; 14+ messages in thread
From: Lennart Poettering @ 2013-06-17 16:33 UTC (permalink / raw)
To: Tejun Heo
Cc: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers,
Michal Hocko, Johannes Weiner, Aristeu Rozanski
On Sun, 16.06.13 15:15, Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org) wrote:
> We want composability regardless of migration, but as for migration
> itself, an alternative could be having an atomic "move everything in
> cgroup A to cgroup B" interface, so that the admin (whether human or
> base system software) can set up a new cgroup and then move the member
> tasks atomically.
I am not sure this would cut it for containers. For containers we'll
usually have a fully populated cgroup subtree, and if we want to migrate
that somewhere else, then we'd something that works recursively and
allows us to not tell the container at all about the move (i.e. the
namespace of cgroupfs the container sees should ideall stay entirely
unaltered by the move).
Lennart
--
Lennart Poettering - Red Hat, Inc.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130617135122.GD5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2013-06-17 16:51 ` Tejun Heo
[not found] ` <20130617165129.GA32663-9pTldWuhBndy/B6EtB590w@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2013-06-17 16:51 UTC (permalink / raw)
To: Michal Hocko
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Lennart Poettering, Johannes Weiner,
cgroups-u79uwXL29TY76Z2rM5mHXA
Hello, Michal.
On Mon, Jun 17, 2013 at 03:51:22PM +0200, Michal Hocko wrote:
> > Some configurations which are legitimate under the current parent
> > might be invalid when put under a different parent.
>
> yes, for example all configurations where old parent is more restrictive
> than the new one. For example. hardlimit in memcg or even more
I'm not following the hardlimit part. Shouldn't a given hardlimit
value have the same meaning regardless of where the node is located?
Its application will surely be different but its meaning would be the
same, no?
> oom_control, swappiness or use_hierarchy which are expected to be
> consistent down the hierarchy.
use_hierarchy is going away. Can you please explain how oom_control
and swappiness behave?
> The biggest problem I can see is how the core cgroup code know when it is
> OK to migrate. There might be some ongoing operations that depend on the
> current tree structure. For example the hierarchical reclaim or oom
> etc..
Internally, I think it should be implemented as task migrating to
another cgroup - IOW, to controllers, it'll appear the same as the
userland echoing the pid to cgroup.procs file on the new cgroup.
That's the only sane way to implement it as controllers need to do
everything which it does for the normal task migration case anyway.
Matching the impedance would be the responsibility of cgroup core.
> I do not think that soft reclaim which I was talking about at LSF would
> change anything here as it would be pretty much the same as the hard
> limit. But that is not so important.
I think it's very important to have trivially stackable
configurations. Maybe we can make more complex semantics work too but
it's gonna be confusing like hell when combined with the level of
automation we're hoping to achieve.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130617163353.GD22846-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
@ 2013-06-17 16:53 ` Tejun Heo
0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2013-06-17 16:53 UTC (permalink / raw)
To: Lennart Poettering
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Michal Hocko, Johannes Weiner,
cgroups-u79uwXL29TY76Z2rM5mHXA
On Mon, Jun 17, 2013 at 06:33:53PM +0200, Lennart Poettering wrote:
> On Sun, 16.06.13 15:15, Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org) wrote:
>
> > We want composability regardless of migration, but as for migration
> > itself, an alternative could be having an atomic "move everything in
> > cgroup A to cgroup B" interface, so that the admin (whether human or
> > base system software) can set up a new cgroup and then move the member
> > tasks atomically.
>
> I am not sure this would cut it for containers. For containers we'll
> usually have a fully populated cgroup subtree, and if we want to migrate
> that somewhere else, then we'd something that works recursively and
> allows us to not tell the container at all about the move (i.e. the
> namespace of cgroupfs the container sees should ideall stay entirely
> unaltered by the move).
Right, I somehow completely forgot about the sub-hierarchy. We can
require the userland to set up the same sub-hierarchy with new configs
and do migration recursively but that's just yucky and there's no way
to make such operations invisible to the users of the sub-hierarchy.
If we're gonna do it, I agree the only sane way is properly
implementing migration via rename(2).
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130617130029.GB3212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-06-17 17:06 ` Tejun Heo
0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2013-06-17 17:06 UTC (permalink / raw)
To: Aristeu Rozanski
Cc: Lennart Poettering, Li Zefan,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers,
Michal Hocko, Johannes Weiner
Hello, Aristeu.
On Mon, Jun 17, 2013 at 09:00:29AM -0400, Aristeu Rozanski wrote:
> sure, I'm on it. that sounds like one of the versions of the hierarchy
> patchset - having the local configuration saved but only applying what's
> possible based on parent's state.
Yes, basically, any configuration should be allowed but its
application would be confined by its ancestors and a parent changing
configuration should propagate the effective limits down the hierarchy
but shouldn't affect the configurations directly - ie. if an ancestor
adds a device to the list of disallowed devices and then removes it
back, the whole subtree should return to the original state.
As we're introducing an incompatible interface with
cgroup_sane_behavior() anyway, if necessary, devcg can also change
behaviors dependent on the flag. Not encouraged but given the breadth
of the changes we're making with the flag, I don't think devcg would
stand out too much by doing so.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-06-18 9:11 ` Li Zefan
2013-06-18 15:16 ` Tejun Heo
1 sibling, 0 replies; 14+ messages in thread
From: Li Zefan @ 2013-06-18 9:11 UTC (permalink / raw)
To: Tejun Heo
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Lennart Poettering
On 2013/6/15 2:18, Tejun Heo wrote:
> cgroup's rename(2) isn't a proper migration implementation - it can't
> move the cgroup to a different parent in the hierarchy. All it can do
> is swapping the name string for that cgroup. This isn't useful and
> can mislead users to think that cgroup supports proper cgroup-level
> migration. Disallow rename(2) if sane_behavior.
>
> v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs
> return value when ->rename is not implemented as suggested by Li.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
> include/linux/cgroup.h | 2 ++
> kernel/cgroup.c | 7 +++++++
> 2 files changed, 9 insertions(+)
>
Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-18 9:11 ` Li Zefan
@ 2013-06-18 15:16 ` Tejun Heo
1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2013-06-18 15:16 UTC (permalink / raw)
To: Li Zefan
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Lennart Poettering
On Fri, Jun 14, 2013 at 11:18:22AM -0700, Tejun Heo wrote:
> cgroup's rename(2) isn't a proper migration implementation - it can't
> move the cgroup to a different parent in the hierarchy. All it can do
> is swapping the name string for that cgroup. This isn't useful and
> can mislead users to think that cgroup supports proper cgroup-level
> migration. Disallow rename(2) if sane_behavior.
>
> v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs
> return value when ->rename is not implemented as suggested by Li.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Applied to cgroup/for-3.11.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
[not found] ` <20130617165129.GA32663-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-06-21 8:35 ` Michal Hocko
0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2013-06-21 8:35 UTC (permalink / raw)
To: Tejun Heo
Cc: Lennart Poettering, Li Zefan,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers,
Johannes Weiner, Aristeu Rozanski
On Mon 17-06-13 09:51:29, Tejun Heo wrote:
> Hello, Michal.
>
> On Mon, Jun 17, 2013 at 03:51:22PM +0200, Michal Hocko wrote:
> > > Some configurations which are legitimate under the current parent
> > > might be invalid when put under a different parent.
> >
> > yes, for example all configurations where old parent is more restrictive
> > than the new one. For example. hardlimit in memcg or even more
>
> I'm not following the hardlimit part. Shouldn't a given hardlimit
> value have the same meaning regardless of where the node is located?
> Its application will surely be different but its meaning would be the
> same, no?
Yes the hardlimit example was misleading. I was thinking about all the
consequences. Sorry about the confusion.
> > oom_control, swappiness or use_hierarchy which are expected to be
> > consistent down the hierarchy.
>
> use_hierarchy is going away. Can you please explain how oom_control
> and swappiness behave?
Both have to be consistent throughout the hierarchy currently.
The same swappiness is used for hierarchical reclaim at any
level. I plan to remove this restriction because this is really too
restrictive. We should be able to say that some groups in a hierarchy
shouldn't swap etc.
I am not sure about oom_control yet, though. OOM is handled at the first
hierarchy level which cannot be pushed down to its hard limit and the
whole subtree is oom frozen.
It would be little awkward if a group down the hierarchy hit its own
limit as well and fired OOM while the OOM is frozen up the hierarchy
and user space tries to find out the best candidate to kill in that
hierarchy:
A (under_oom=true) - waiting for userspace
\
.
.
.
\
B - hit the limit and call mem_cgroup_out_of_memory
On the other hand the handler should be prepared to handle exiting
tasks (which might change the oom situation) so the under_oom has to be
checked after a victim is selected and before it is killed to prevent
from an excessive killing.
It would be even more interesting if the situation was opposite. Child
has the oom disabled and waits for userspace to handle the situation.
A (oom_control=0)
/ \___________
/ \
B (charge) C (oom_control=1)
Parent gets under oom as well because a sibling pushes it to the
hard limit. Parent wouldn't be able to oom freeze the hierarchy
(mem_cgroup_oom_lock) because there is an ongoing oom down the hierarchy
so it would be basically waiting for userspace as well. This would be
quite unexpected. While the oom freezing could be tweaked to handle this
it sounds quite messy to me.
> > The biggest problem I can see is how the core cgroup code know when it is
> > OK to migrate. There might be some ongoing operations that depend on the
> > current tree structure. For example the hierarchical reclaim or oom
> > etc..
>
> Internally, I think it should be implemented as task migrating to
> another cgroup - IOW, to controllers, it'll appear the same as the
> userland echoing the pid to cgroup.procs file on the new cgroup.
OK, that should work. The controller still should have an option to say
that the current configuration is not compatible with its new parent.
E.g. for memcg if parent->oom_control != moving_memcg->oom_control so
the move would fail.
> That's the only sane way to implement it as controllers need to do
> everything which it does for the normal task migration case anyway.
> Matching the impedance would be the responsibility of cgroup core.
>
> > I do not think that soft reclaim which I was talking about at LSF would
> > change anything here as it would be pretty much the same as the hard
> > limit. But that is not so important.
>
> I think it's very important to have trivially stackable
> configurations. Maybe we can make more complex semantics work too but
> it's gonna be confusing like hell when combined with the level of
> automation we're hoping to achieve.
>
> Thanks.
>
> --
> tejun
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-06-21 8:35 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-14 3:47 [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior Tejun Heo
[not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-14 7:44 ` Li Zefan
2013-06-14 18:18 ` [PATCH v2 " Tejun Heo
[not found] ` <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-18 9:11 ` Li Zefan
2013-06-18 15:16 ` Tejun Heo
2013-06-16 7:16 ` [PATCH " Lennart Poettering
[not found] ` <20130616071648.GA1978-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
2013-06-16 22:15 ` Tejun Heo
[not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-17 13:00 ` Aristeu Rozanski
[not found] ` <20130617130029.GB3212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-17 17:06 ` Tejun Heo
2013-06-17 13:51 ` Michal Hocko
[not found] ` <20130617135122.GD5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-17 16:51 ` Tejun Heo
[not found] ` <20130617165129.GA32663-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-21 8:35 ` Michal Hocko
2013-06-17 16:33 ` Lennart Poettering
[not found] ` <20130617163353.GD22846-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
2013-06-17 16:53 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).