* [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior
@ 2013-06-14 3:47 Tejun Heo
[not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2013-06-14 3:47 UTC (permalink / raw)
To: Li Zefan
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Kay Sievers, Lennart Poettering
cgroup's rename(2) isn't a proper migration implementation - it can't
move the cgroup to a different parent in the hierarchy. All it can do
is swapping the name string for that cgroup. This isn't useful and
can mislead users to think that cgroup supports proper cgroup-level
migration. Disallow rename(2) if sane_behavior.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 7 +++++++
2 files changed, 9 insertions(+)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 1760476..f975227 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -270,6 +270,8 @@ enum {
* - "release_agent" and "notify_on_release" are removed.
* Replacement notification mechanism will be implemented.
*
+ * - rename(2) is disallowed.
+ *
* - memcg: use_hierarchy is on by default and the cgroup file for
* the flag is not created.
*/
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2e9da7b..7d78902 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2508,6 +2508,13 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry,
cgrp = __d_cgrp(old_dentry);
+ /*
+ * This isn't a proper migration and its usefulness is very
+ * limited. Disallow if sane_behavior.
+ */
+ if (cgroup_sane_behavior(cgrp))
+ return -EINVAL;
+
name = cgroup_alloc_name(new_dentry);
if (!name)
return -ENOMEM;
^ permalink raw reply related [flat|nested] 14+ messages in thread[parent not found: <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> @ 2013-06-14 7:44 ` Li Zefan 2013-06-14 18:18 ` [PATCH v2 " Tejun Heo 2013-06-16 7:16 ` [PATCH " Lennart Poettering 2 siblings, 0 replies; 14+ messages in thread From: Li Zefan @ 2013-06-14 7:44 UTC (permalink / raw) To: Tejun Heo Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Kay Sievers, Lennart Poettering On 2013/6/14 11:47, Tejun Heo wrote: > cgroup's rename(2) isn't a proper migration implementation - it can't > move the cgroup to a different parent in the hierarchy. All it can do > is swapping the name string for that cgroup. This isn't useful and > can mislead users to think that cgroup supports proper cgroup-level > migration. Disallow rename(2) if sane_behavior. > > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > --- > include/linux/cgroup.h | 2 ++ > kernel/cgroup.c | 7 +++++++ > 2 files changed, 9 insertions(+) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index 1760476..f975227 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -270,6 +270,8 @@ enum { > * - "release_agent" and "notify_on_release" are removed. > * Replacement notification mechanism will be implemented. > * > + * - rename(2) is disallowed. > + * > * - memcg: use_hierarchy is on by default and the cgroup file for > * the flag is not created. > */ > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 2e9da7b..7d78902 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2508,6 +2508,13 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry, > > cgrp = __d_cgrp(old_dentry); > > + /* > + * This isn't a proper migration and its usefulness is very > + * limited. Disallow if sane_behavior. > + */ > + if (cgroup_sane_behavior(cgrp)) > + return -EINVAL; > + Isn't -EPERM a better errno? VFS will return -EPERM if inode->rename is NULL. > name = cgroup_alloc_name(new_dentry); > if (!name) > return -ENOMEM; ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2013-06-14 7:44 ` Li Zefan @ 2013-06-14 18:18 ` Tejun Heo [not found] ` <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org> 2013-06-16 7:16 ` [PATCH " Lennart Poettering 2 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2013-06-14 18:18 UTC (permalink / raw) To: Li Zefan Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Kay Sievers, Lennart Poettering cgroup's rename(2) isn't a proper migration implementation - it can't move the cgroup to a different parent in the hierarchy. All it can do is swapping the name string for that cgroup. This isn't useful and can mislead users to think that cgroup supports proper cgroup-level migration. Disallow rename(2) if sane_behavior. v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs return value when ->rename is not implemented as suggested by Li. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> --- include/linux/cgroup.h | 2 ++ kernel/cgroup.c | 7 +++++++ 2 files changed, 9 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 1760476..f975227 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -270,6 +270,8 @@ enum { * - "release_agent" and "notify_on_release" are removed. * Replacement notification mechanism will be implemented. * + * - rename(2) is disallowed. + * * - memcg: use_hierarchy is on by default and the cgroup file for * the flag is not created. */ diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2e9da7b..c2c64005 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2508,6 +2508,13 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry, cgrp = __d_cgrp(old_dentry); + /* + * This isn't a proper migration and its usefulness is very + * limited. Disallow if sane_behavior. + */ + if (cgroup_sane_behavior(cgrp)) + return -EPERM; + name = cgroup_alloc_name(new_dentry); if (!name) return -ENOMEM; ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org>]
* Re: [PATCH v2 cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org> @ 2013-06-18 9:11 ` Li Zefan 2013-06-18 15:16 ` Tejun Heo 1 sibling, 0 replies; 14+ messages in thread From: Li Zefan @ 2013-06-18 9:11 UTC (permalink / raw) To: Tejun Heo Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Kay Sievers, Lennart Poettering On 2013/6/15 2:18, Tejun Heo wrote: > cgroup's rename(2) isn't a proper migration implementation - it can't > move the cgroup to a different parent in the hierarchy. All it can do > is swapping the name string for that cgroup. This isn't useful and > can mislead users to think that cgroup supports proper cgroup-level > migration. Disallow rename(2) if sane_behavior. > > v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs > return value when ->rename is not implemented as suggested by Li. > > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> > --- > include/linux/cgroup.h | 2 ++ > kernel/cgroup.c | 7 +++++++ > 2 files changed, 9 insertions(+) > Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org> 2013-06-18 9:11 ` Li Zefan @ 2013-06-18 15:16 ` Tejun Heo 1 sibling, 0 replies; 14+ messages in thread From: Tejun Heo @ 2013-06-18 15:16 UTC (permalink / raw) To: Li Zefan Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Kay Sievers, Lennart Poettering On Fri, Jun 14, 2013 at 11:18:22AM -0700, Tejun Heo wrote: > cgroup's rename(2) isn't a proper migration implementation - it can't > move the cgroup to a different parent in the hierarchy. All it can do > is swapping the name string for that cgroup. This isn't useful and > can mislead users to think that cgroup supports proper cgroup-level > migration. Disallow rename(2) if sane_behavior. > > v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs > return value when ->rename is not implemented as suggested by Li. > > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Applied to cgroup/for-3.11. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2013-06-14 7:44 ` Li Zefan 2013-06-14 18:18 ` [PATCH v2 " Tejun Heo @ 2013-06-16 7:16 ` Lennart Poettering [not found] ` <20130616071648.GA1978-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org> 2 siblings, 1 reply; 14+ messages in thread From: Lennart Poettering @ 2013-06-16 7:16 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers On Thu, 13.06.13 20:47, Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org) wrote: > cgroup's rename(2) isn't a proper migration implementation - it can't > move the cgroup to a different parent in the hierarchy. All it can do > is swapping the name string for that cgroup. This isn't useful and > can mislead users to think that cgroup supports proper cgroup-level > migration. Disallow rename(2) if sane_behavior. Note that in the long run I'd really like to see functionality added to move a full cgroup subtree to a different point in the tree, so that we can migrate containers and such atomically from one partition to another, without the container having to be aware of this (which of course implies that we have a somewhat more useful cgroup namespacing solution in place). Lennart -- Lennart Poettering - Red Hat, Inc. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20130616071648.GA1978-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>]
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130616071648.GA1978-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org> @ 2013-06-16 22:15 ` Tejun Heo [not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2013-06-16 22:15 UTC (permalink / raw) To: Lennart Poettering Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Kay Sievers, Michal Hocko, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA Hey, Lennart. On Sun, Jun 16, 2013 at 09:16:48AM +0200, Lennart Poettering wrote: > Note that in the long run I'd really like to see functionality added to > move a full cgroup subtree to a different point in the tree, so that we > can migrate containers and such atomically from one partition to As some configurations are still affected by who one's parent is, we can't do migration using rename(2) right now. Some configurations which are legitimate under the current parent might be invalid when put under a different parent. For migration to work, all configurations should be composable, which currently isn't true. Off the top of my head, I think the offending ones are cpuset and device_cgroup, and I *think* the scheme Michal proposed at LSF breaks composability. In general, configuration composability is something we want. We don't want descendant configs being updated or becoming invalid as configurations of an ancestor change - any configuration should be valid. The effects of some configurations could be no-op depending on how the ancestors are configured, but that's just the configuration being interpreted within the limits of its ancestors. Li is working to make cpuset composable. As for device_cgroup, I'm not sure what to do. Aristeu, I know you already put in too much work into devcg and I'm sorry that I'm raising this now but would you be interested in tackling this too? We want composability regardless of migration, but as for migration itself, an alternative could be having an atomic "move everything in cgroup A to cgroup B" interface, so that the admin (whether human or base system software) can set up a new cgroup and then move the member tasks atomically. > another, without the container having to be aware of this (which of > course implies that we have a somewhat more useful cgroup namespacing > solution in place). But such approach would definitely be visible to containers. :( Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> @ 2013-06-17 13:00 ` Aristeu Rozanski [not found] ` <20130617130029.GB3212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-06-17 13:51 ` Michal Hocko 2013-06-17 16:33 ` Lennart Poettering 2 siblings, 1 reply; 14+ messages in thread From: Aristeu Rozanski @ 2013-06-17 13:00 UTC (permalink / raw) To: Tejun Heo Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Kay Sievers, Michal Hocko, Lennart Poettering, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA On Sun, Jun 16, 2013 at 03:15:56PM -0700, Tejun Heo wrote: > Li is working to make cpuset composable. As for device_cgroup, I'm > not sure what to do. Aristeu, I know you already put in too much work > into devcg and I'm sorry that I'm raising this now but would you be > interested in tackling this too? sure, I'm on it. that sounds like one of the versions of the hierarchy patchset - having the local configuration saved but only applying what's possible based on parent's state. -- Aristeu ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20130617130029.GB3212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130617130029.GB3212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-06-17 17:06 ` Tejun Heo 0 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2013-06-17 17:06 UTC (permalink / raw) To: Aristeu Rozanski Cc: Lennart Poettering, Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers, Michal Hocko, Johannes Weiner Hello, Aristeu. On Mon, Jun 17, 2013 at 09:00:29AM -0400, Aristeu Rozanski wrote: > sure, I'm on it. that sounds like one of the versions of the hierarchy > patchset - having the local configuration saved but only applying what's > possible based on parent's state. Yes, basically, any configuration should be allowed but its application would be confined by its ancestors and a parent changing configuration should propagate the effective limits down the hierarchy but shouldn't affect the configurations directly - ie. if an ancestor adds a device to the list of disallowed devices and then removes it back, the whole subtree should return to the original state. As we're introducing an incompatible interface with cgroup_sane_behavior() anyway, if necessary, devcg can also change behaviors dependent on the flag. Not encouraged but given the breadth of the changes we're making with the flag, I don't think devcg would stand out too much by doing so. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2013-06-17 13:00 ` Aristeu Rozanski @ 2013-06-17 13:51 ` Michal Hocko [not found] ` <20130617135122.GD5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> 2013-06-17 16:33 ` Lennart Poettering 2 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2013-06-17 13:51 UTC (permalink / raw) To: Tejun Heo Cc: Lennart Poettering, Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers, Johannes Weiner, Aristeu Rozanski On Sun 16-06-13 15:15:56, Tejun Heo wrote: > Hey, Lennart. > > On Sun, Jun 16, 2013 at 09:16:48AM +0200, Lennart Poettering wrote: > > Note that in the long run I'd really like to see functionality added to > > move a full cgroup subtree to a different point in the tree, so that we > > can migrate containers and such atomically from one partition to > > As some configurations are still affected by who one's parent is, we > can't do migration using rename(2) right now. > Some configurations which are legitimate under the current parent > might be invalid when put under a different parent. yes, for example all configurations where old parent is more restrictive than the new one. For example. hardlimit in memcg or even more oom_control, swappiness or use_hierarchy which are expected to be consistent down the hierarchy. > For migration to > work, all configurations should be composable, which currently isn't > true. Off the top of my head, I think the offending ones are cpuset > and device_cgroup, and I *think* the scheme Michal proposed at LSF > breaks composability. The biggest problem I can see is how the core cgroup code know when it is OK to migrate. There might be some ongoing operations that depend on the current tree structure. For example the hierarchical reclaim or oom etc.. I do not think that soft reclaim which I was talking about at LSF would change anything here as it would be pretty much the same as the hard limit. But that is not so important. > In general, configuration composability is something we want. We > don't want descendant configs being updated or becoming invalid as > configurations of an ancestor change - any configuration should be > valid. The effects of some configurations could be no-op depending on > how the ancestors are configured, but that's just the configuration > being interpreted within the limits of its ancestors. > > Li is working to make cpuset composable. As for device_cgroup, I'm > not sure what to do. Aristeu, I know you already put in too much work > into devcg and I'm sorry that I'm raising this now but would you be > interested in tackling this too? > > We want composability regardless of migration, but as for migration > itself, an alternative could be having an atomic "move everything in > cgroup A to cgroup B" interface, so that the admin (whether human or > base system software) can set up a new cgroup and then move the member > tasks atomically. > > > another, without the container having to be aware of this (which of > > course implies that we have a somewhat more useful cgroup namespacing > > solution in place). > > But such approach would definitely be visible to containers. :( > > Thanks. > > -- > tejun -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20130617135122.GD5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130617135122.GD5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> @ 2013-06-17 16:51 ` Tejun Heo [not found] ` <20130617165129.GA32663-9pTldWuhBndy/B6EtB590w@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2013-06-17 16:51 UTC (permalink / raw) To: Michal Hocko Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Kay Sievers, Lennart Poettering, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA Hello, Michal. On Mon, Jun 17, 2013 at 03:51:22PM +0200, Michal Hocko wrote: > > Some configurations which are legitimate under the current parent > > might be invalid when put under a different parent. > > yes, for example all configurations where old parent is more restrictive > than the new one. For example. hardlimit in memcg or even more I'm not following the hardlimit part. Shouldn't a given hardlimit value have the same meaning regardless of where the node is located? Its application will surely be different but its meaning would be the same, no? > oom_control, swappiness or use_hierarchy which are expected to be > consistent down the hierarchy. use_hierarchy is going away. Can you please explain how oom_control and swappiness behave? > The biggest problem I can see is how the core cgroup code know when it is > OK to migrate. There might be some ongoing operations that depend on the > current tree structure. For example the hierarchical reclaim or oom > etc.. Internally, I think it should be implemented as task migrating to another cgroup - IOW, to controllers, it'll appear the same as the userland echoing the pid to cgroup.procs file on the new cgroup. That's the only sane way to implement it as controllers need to do everything which it does for the normal task migration case anyway. Matching the impedance would be the responsibility of cgroup core. > I do not think that soft reclaim which I was talking about at LSF would > change anything here as it would be pretty much the same as the hard > limit. But that is not so important. I think it's very important to have trivially stackable configurations. Maybe we can make more complex semantics work too but it's gonna be confusing like hell when combined with the level of automation we're hoping to achieve. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20130617165129.GA32663-9pTldWuhBndy/B6EtB590w@public.gmane.org>]
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130617165129.GA32663-9pTldWuhBndy/B6EtB590w@public.gmane.org> @ 2013-06-21 8:35 ` Michal Hocko 0 siblings, 0 replies; 14+ messages in thread From: Michal Hocko @ 2013-06-21 8:35 UTC (permalink / raw) To: Tejun Heo Cc: Lennart Poettering, Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers, Johannes Weiner, Aristeu Rozanski On Mon 17-06-13 09:51:29, Tejun Heo wrote: > Hello, Michal. > > On Mon, Jun 17, 2013 at 03:51:22PM +0200, Michal Hocko wrote: > > > Some configurations which are legitimate under the current parent > > > might be invalid when put under a different parent. > > > > yes, for example all configurations where old parent is more restrictive > > than the new one. For example. hardlimit in memcg or even more > > I'm not following the hardlimit part. Shouldn't a given hardlimit > value have the same meaning regardless of where the node is located? > Its application will surely be different but its meaning would be the > same, no? Yes the hardlimit example was misleading. I was thinking about all the consequences. Sorry about the confusion. > > oom_control, swappiness or use_hierarchy which are expected to be > > consistent down the hierarchy. > > use_hierarchy is going away. Can you please explain how oom_control > and swappiness behave? Both have to be consistent throughout the hierarchy currently. The same swappiness is used for hierarchical reclaim at any level. I plan to remove this restriction because this is really too restrictive. We should be able to say that some groups in a hierarchy shouldn't swap etc. I am not sure about oom_control yet, though. OOM is handled at the first hierarchy level which cannot be pushed down to its hard limit and the whole subtree is oom frozen. It would be little awkward if a group down the hierarchy hit its own limit as well and fired OOM while the OOM is frozen up the hierarchy and user space tries to find out the best candidate to kill in that hierarchy: A (under_oom=true) - waiting for userspace \ . . . \ B - hit the limit and call mem_cgroup_out_of_memory On the other hand the handler should be prepared to handle exiting tasks (which might change the oom situation) so the under_oom has to be checked after a victim is selected and before it is killed to prevent from an excessive killing. It would be even more interesting if the situation was opposite. Child has the oom disabled and waits for userspace to handle the situation. A (oom_control=0) / \___________ / \ B (charge) C (oom_control=1) Parent gets under oom as well because a sibling pushes it to the hard limit. Parent wouldn't be able to oom freeze the hierarchy (mem_cgroup_oom_lock) because there is an ongoing oom down the hierarchy so it would be basically waiting for userspace as well. This would be quite unexpected. While the oom freezing could be tweaked to handle this it sounds quite messy to me. > > The biggest problem I can see is how the core cgroup code know when it is > > OK to migrate. There might be some ongoing operations that depend on the > > current tree structure. For example the hierarchical reclaim or oom > > etc.. > > Internally, I think it should be implemented as task migrating to > another cgroup - IOW, to controllers, it'll appear the same as the > userland echoing the pid to cgroup.procs file on the new cgroup. OK, that should work. The controller still should have an option to say that the current configuration is not compatible with its new parent. E.g. for memcg if parent->oom_control != moving_memcg->oom_control so the move would fail. > That's the only sane way to implement it as controllers need to do > everything which it does for the normal task migration case anyway. > Matching the impedance would be the responsibility of cgroup core. > > > I do not think that soft reclaim which I was talking about at LSF would > > change anything here as it would be pretty much the same as the hard > > limit. But that is not so important. > > I think it's very important to have trivially stackable > configurations. Maybe we can make more complex semantics work too but > it's gonna be confusing like hell when combined with the level of > automation we're hoping to achieve. > > Thanks. > > -- > tejun -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 2013-06-17 13:00 ` Aristeu Rozanski 2013-06-17 13:51 ` Michal Hocko @ 2013-06-17 16:33 ` Lennart Poettering [not found] ` <20130617163353.GD22846-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org> 2 siblings, 1 reply; 14+ messages in thread From: Lennart Poettering @ 2013-06-17 16:33 UTC (permalink / raw) To: Tejun Heo Cc: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Kay Sievers, Michal Hocko, Johannes Weiner, Aristeu Rozanski On Sun, 16.06.13 15:15, Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org) wrote: > We want composability regardless of migration, but as for migration > itself, an alternative could be having an atomic "move everything in > cgroup A to cgroup B" interface, so that the admin (whether human or > base system software) can set up a new cgroup and then move the member > tasks atomically. I am not sure this would cut it for containers. For containers we'll usually have a fully populated cgroup subtree, and if we want to migrate that somewhere else, then we'd something that works recursively and allows us to not tell the container at all about the move (i.e. the namespace of cgroupfs the container sees should ideall stay entirely unaltered by the move). Lennart -- Lennart Poettering - Red Hat, Inc. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20130617163353.GD22846-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>]
* Re: [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior [not found] ` <20130617163353.GD22846-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org> @ 2013-06-17 16:53 ` Tejun Heo 0 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2013-06-17 16:53 UTC (permalink / raw) To: Lennart Poettering Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Kay Sievers, Michal Hocko, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA On Mon, Jun 17, 2013 at 06:33:53PM +0200, Lennart Poettering wrote: > On Sun, 16.06.13 15:15, Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org) wrote: > > > We want composability regardless of migration, but as for migration > > itself, an alternative could be having an atomic "move everything in > > cgroup A to cgroup B" interface, so that the admin (whether human or > > base system software) can set up a new cgroup and then move the member > > tasks atomically. > > I am not sure this would cut it for containers. For containers we'll > usually have a fully populated cgroup subtree, and if we want to migrate > that somewhere else, then we'd something that works recursively and > allows us to not tell the container at all about the move (i.e. the > namespace of cgroupfs the container sees should ideall stay entirely > unaltered by the move). Right, I somehow completely forgot about the sub-hierarchy. We can require the userland to set up the same sub-hierarchy with new configs and do migration recursively but that's just yucky and there's no way to make such operations invisible to the users of the sub-hierarchy. If we're gonna do it, I agree the only sane way is properly implementing migration via rename(2). Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-06-21 8:35 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-14 3:47 [PATCH cgroup/for-3.11] cgroup: disallow rename(2) if sane_behavior Tejun Heo
[not found] ` <20130614034717.GA31533-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-14 7:44 ` Li Zefan
2013-06-14 18:18 ` [PATCH v2 " Tejun Heo
[not found] ` <20130614181822.GC6593-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-18 9:11 ` Li Zefan
2013-06-18 15:16 ` Tejun Heo
2013-06-16 7:16 ` [PATCH " Lennart Poettering
[not found] ` <20130616071648.GA1978-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
2013-06-16 22:15 ` Tejun Heo
[not found] ` <20130616221556.GC28587-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-17 13:00 ` Aristeu Rozanski
[not found] ` <20130617130029.GB3212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-17 17:06 ` Tejun Heo
2013-06-17 13:51 ` Michal Hocko
[not found] ` <20130617135122.GD5018-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-06-17 16:51 ` Tejun Heo
[not found] ` <20130617165129.GA32663-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-21 8:35 ` Michal Hocko
2013-06-17 16:33 ` Lennart Poettering
[not found] ` <20130617163353.GD22846-kS5D54t9nk0aINubkmmoJbNAH6kLmebB@public.gmane.org>
2013-06-17 16:53 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).