* [PATCH 1/9] cgroup: add cgroup_subsys->post_create()
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-03 8:38 ` Tejun Heo
[not found] ` <1351931915-1701-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-08 19:07 ` [PATCH 1/9 v3] " Tejun Heo
2012-11-03 8:38 ` [PATCH 2/9] cgroup: Use rculist ops for cgroup->children Tejun Heo
` (8 subsequent siblings)
9 siblings, 2 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
Currently, there's no way for a controller to find out whether a new
cgroup finished all ->create() allocatinos successfully and is
considered "live" by cgroup.
This becomes a problem later when we add generic descendants walking
to cgroup which can be used by controllers as controllers don't have a
synchronization point where it can synchronize against new cgroups
appearing in such walks.
This patch adds ->post_create(). It's called after all ->create()
succeeded and the cgroup is linked into the generic cgroup hierarchy.
This plays the counterpart of ->pre_destroy().
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
---
include/linux/cgroup.h | 1 +
kernel/cgroup.c | 12 ++++++++++--
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index fe876a7..b442122 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -438,6 +438,7 @@ int cgroup_taskset_size(struct cgroup_taskset *tset);
struct cgroup_subsys {
struct cgroup_subsys_state *(*create)(struct cgroup *cgrp);
+ void (*post_create)(struct cgroup *cgrp);
void (*pre_destroy)(struct cgroup *cgrp);
void (*destroy)(struct cgroup *cgrp);
int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e3045ad..f05d992 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4060,10 +4060,15 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
if (err < 0)
goto err_remove;
- /* each css holds a ref to the cgroup's dentry */
- for_each_subsys(root, ss)
+ for_each_subsys(root, ss) {
+ /* each css holds a ref to the cgroup's dentry */
dget(dentry);
+ /* creation succeeded, notify subsystems */
+ if (ss->post_create)
+ ss->post_create(cgrp);
+ }
+
/* The cgroup directory was pre-locked for us */
BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex));
@@ -4281,6 +4286,9 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
ss->active = 1;
+ if (ss->post_create)
+ ss->post_create(&ss->root->top_cgroup);
+
/* this function shouldn't be used with modular subsystems, since they
* need to register a subsys_id, among other things */
BUG_ON(ss->module);
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
[parent not found: <1351931915-1701-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 1/9] cgroup: add cgroup_subsys->post_create()
[not found] ` <1351931915-1701-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-05 13:42 ` Glauber Costa
[not found] ` <5097C23B.3040808-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-11-07 15:25 ` [PATCH 1/9] cgroup: add cgroup_subsys->post_create() Michal Hocko
2012-11-07 17:15 ` [PATCH 1/9 v2] " Tejun Heo
2 siblings, 1 reply; 74+ messages in thread
From: Glauber Costa @ 2012-11-05 13:42 UTC (permalink / raw)
To: Tejun Heo
Cc: rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, mhocko-AlSwsSmVLrQ,
cgroups-u79uwXL29TY76Z2rM5mHXA
On 11/03/2012 09:38 AM, Tejun Heo wrote:
> Currently, there's no way for a controller to find out whether a new
> cgroup finished all ->create() allocatinos successfully and is
> considered "live" by cgroup.
>
> This becomes a problem later when we add generic descendants walking
> to cgroup which can be used by controllers as controllers don't have a
> synchronization point where it can synchronize against new cgroups
> appearing in such walks.
>
> This patch adds ->post_create(). It's called after all ->create()
> succeeded and the cgroup is linked into the generic cgroup hierarchy.
> This plays the counterpart of ->pre_destroy().
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Tejun, If we do it this way, we end up with two callbacks that are
called after create: post_clone and post_create. I myself prefer the
approach I took, that convert post_clone into post_create, and would
prefer if you would pick that up.
For me, post_clone is totally a glitch that should not exist. Merging
this with post_create gives the following semantics:
* A while after cgroup creation, you will get a callback. In that
callback, you do whatever initialization you may need that you could not
in create. Why is reacting to a flag being set any different?
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/9] cgroup: add cgroup_subsys->post_create()
[not found] ` <1351931915-1701-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-05 13:42 ` Glauber Costa
@ 2012-11-07 15:25 ` Michal Hocko
[not found] ` <20121107152516.GA4131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-07 17:15 ` [PATCH 1/9 v2] " Tejun Heo
2 siblings, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2012-11-07 15:25 UTC (permalink / raw)
To: Tejun Heo
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
cgroups-u79uwXL29TY76Z2rM5mHXA
On Sat 03-11-12 01:38:27, Tejun Heo wrote:
> Currently, there's no way for a controller to find out whether a new
> cgroup finished all ->create() allocatinos successfully and is
> considered "live" by cgroup.
>
> This becomes a problem later when we add generic descendants walking
> to cgroup which can be used by controllers as controllers don't have a
> synchronization point where it can synchronize against new cgroups
> appearing in such walks.
>
> This patch adds ->post_create(). It's called after all ->create()
> succeeded and the cgroup is linked into the generic cgroup hierarchy.
> This plays the counterpart of ->pre_destroy().
Hmm, I had to look at "cgroup_freezer: implement proper hierarchy
support" to actually understand what is the callback good for. The above
sounds as if the callback is needed when a controller wants to use
the new iterators or when pre_destroy is defined.
I think it would be helpful if the changelog described that the callback
is needed when the controller keeps a mutable shared state for the
hierarchy. For example memory controller doesn't have any such a strict
requirement so we can safely use your new iterators without pre_destroy.
Anyway, I like this change because the shared state is now really easy
to implement.
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> ---
> include/linux/cgroup.h | 1 +
> kernel/cgroup.c | 12 ++++++++++--
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index fe876a7..b442122 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -438,6 +438,7 @@ int cgroup_taskset_size(struct cgroup_taskset *tset);
>
> struct cgroup_subsys {
> struct cgroup_subsys_state *(*create)(struct cgroup *cgrp);
> + void (*post_create)(struct cgroup *cgrp);
> void (*pre_destroy)(struct cgroup *cgrp);
> void (*destroy)(struct cgroup *cgrp);
> int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset);
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index e3045ad..f05d992 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4060,10 +4060,15 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
> if (err < 0)
> goto err_remove;
>
> - /* each css holds a ref to the cgroup's dentry */
> - for_each_subsys(root, ss)
> + for_each_subsys(root, ss) {
> + /* each css holds a ref to the cgroup's dentry */
> dget(dentry);
>
> + /* creation succeeded, notify subsystems */
> + if (ss->post_create)
> + ss->post_create(cgrp);
> + }
> +
> /* The cgroup directory was pre-locked for us */
> BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex));
>
> @@ -4281,6 +4286,9 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
>
> ss->active = 1;
>
> + if (ss->post_create)
> + ss->post_create(&ss->root->top_cgroup);
> +
> /* this function shouldn't be used with modular subsystems, since they
> * need to register a subsys_id, among other things */
> BUG_ON(ss->module);
> --
> 1.7.11.7
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 1/9 v2] cgroup: add cgroup_subsys->post_create()
[not found] ` <1351931915-1701-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-05 13:42 ` Glauber Costa
2012-11-07 15:25 ` [PATCH 1/9] cgroup: add cgroup_subsys->post_create() Michal Hocko
@ 2012-11-07 17:15 ` Tejun Heo
[not found] ` <20121107171508.GF2660-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2012-11-07 17:15 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA
Currently, there's no way for a controller to find out whether a new
cgroup finished all ->create() allocatinos successfully and is
considered "live" by cgroup.
This becomes a problem later when we add generic descendants walking
to cgroup which can be used by controllers as controllers don't have a
synchronization point where it can synchronize against new cgroups
appearing in such walks.
This patch adds ->post_create(). It's called after all ->create()
succeeded and the cgroup is linked into the generic cgroup hierarchy.
This plays the counterpart of ->pre_destroy().
When used in combination with the to-be-added generic descendant
iterators, ->post_create() can be used to implement reliable state
inheritance. It will be explained with the descendant iterators.
v2: Added a paragraph about its future use w/ descendant iterators per
Michal.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
---
If you can explain it better, please be my guest.
Thanks.
include/linux/cgroup.h | 1 +
kernel/cgroup.c | 12 ++++++++++--
2 files changed, 11 insertions(+), 2 deletions(-)
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -438,6 +438,7 @@ int cgroup_taskset_size(struct cgroup_ta
struct cgroup_subsys {
struct cgroup_subsys_state *(*create)(struct cgroup *cgrp);
+ void (*post_create)(struct cgroup *cgrp);
void (*pre_destroy)(struct cgroup *cgrp);
void (*destroy)(struct cgroup *cgrp);
int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset);
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4060,10 +4060,15 @@ static long cgroup_create(struct cgroup
if (err < 0)
goto err_remove;
- /* each css holds a ref to the cgroup's dentry */
- for_each_subsys(root, ss)
+ for_each_subsys(root, ss) {
+ /* each css holds a ref to the cgroup's dentry */
dget(dentry);
+ /* creation succeeded, notify subsystems */
+ if (ss->post_create)
+ ss->post_create(cgrp);
+ }
+
/* The cgroup directory was pre-locked for us */
BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex));
@@ -4281,6 +4286,9 @@ static void __init cgroup_init_subsys(st
ss->active = 1;
+ if (ss->post_create)
+ ss->post_create(&ss->root->top_cgroup);
+
/* this function shouldn't be used with modular subsystems, since they
* need to register a subsys_id, among other things */
BUG_ON(ss->module);
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 1/9 v3] cgroup: add cgroup_subsys->post_create()
2012-11-03 8:38 ` [PATCH 1/9] cgroup: add cgroup_subsys->post_create() Tejun Heo
[not found] ` <1351931915-1701-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 19:07 ` Tejun Heo
2012-11-09 9:09 ` Li Zefan
[not found] ` <20121108190715.GD9672-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
1 sibling, 2 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-08 19:07 UTC (permalink / raw)
To: lizefan, mhocko, rjw
Cc: containers, cgroups, linux-kernel, linux-pm, fweisbec,
Glauber Costa
Subject: cgroup: add cgroup_subsys->post_create()
Currently, there's no way for a controller to find out whether a new
cgroup finished all ->create() allocatinos successfully and is
considered "live" by cgroup.
This becomes a problem later when we add generic descendants walking
to cgroup which can be used by controllers as controllers don't have a
synchronization point where it can synchronize against new cgroups
appearing in such walks.
This patch adds ->post_create(). It's called after all ->create()
succeeded and the cgroup is linked into the generic cgroup hierarchy.
This plays the counterpart of ->pre_destroy().
When used in combination with the to-be-added generic descendant
iterators, ->post_create() can be used to implement reliable state
inheritance. It will be explained with the descendant iterators.
v2: Added a paragraph about its future use w/ descendant iterators per
Michal.
v3: Forgot to add ->post_create() invocation to cgroup_load_subsys().
Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
---
Oops, forgot updating cgroup_load_subsys(). Hate that it's a
different path from cgroup_init_subsys(). :(
Thanks.
include/linux/cgroup.h | 1 +
kernel/cgroup.c | 15 +++++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -438,6 +438,7 @@ int cgroup_taskset_size(struct cgroup_ta
struct cgroup_subsys {
struct cgroup_subsys_state *(*create)(struct cgroup *cgrp);
+ void (*post_create)(struct cgroup *cgrp);
void (*pre_destroy)(struct cgroup *cgrp);
void (*destroy)(struct cgroup *cgrp);
int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset);
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4059,10 +4059,15 @@ static long cgroup_create(struct cgroup
if (err < 0)
goto err_remove;
- /* each css holds a ref to the cgroup's dentry */
- for_each_subsys(root, ss)
+ for_each_subsys(root, ss) {
+ /* each css holds a ref to the cgroup's dentry */
dget(dentry);
+ /* creation succeeded, notify subsystems */
+ if (ss->post_create)
+ ss->post_create(cgrp);
+ }
+
/* The cgroup directory was pre-locked for us */
BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex));
@@ -4280,6 +4285,9 @@ static void __init cgroup_init_subsys(st
ss->active = 1;
+ if (ss->post_create)
+ ss->post_create(&ss->root->top_cgroup);
+
/* this function shouldn't be used with modular subsystems, since they
* need to register a subsys_id, among other things */
BUG_ON(ss->module);
@@ -4389,6 +4397,9 @@ int __init_or_module cgroup_load_subsys(
ss->active = 1;
+ if (ss->post_create)
+ ss->post_create(&ss->root->top_cgroup);
+
/* success! */
mutex_unlock(&cgroup_mutex);
return 0;
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/9 v3] cgroup: add cgroup_subsys->post_create()
2012-11-08 19:07 ` [PATCH 1/9 v3] " Tejun Heo
@ 2012-11-09 9:09 ` Li Zefan
[not found] ` <20121108190715.GD9672-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
1 sibling, 0 replies; 74+ messages in thread
From: Li Zefan @ 2012-11-09 9:09 UTC (permalink / raw)
To: Tejun Heo
Cc: mhocko, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec, Glauber Costa
On 2012/11/9 3:07, Tejun Heo wrote:
> Subject: cgroup: add cgroup_subsys->post_create()
>
> Currently, there's no way for a controller to find out whether a new
> cgroup finished all ->create() allocatinos successfully and is
> considered "live" by cgroup.
>
> This becomes a problem later when we add generic descendants walking
> to cgroup which can be used by controllers as controllers don't have a
> synchronization point where it can synchronize against new cgroups
> appearing in such walks.
>
> This patch adds ->post_create(). It's called after all ->create()
> succeeded and the cgroup is linked into the generic cgroup hierarchy.
> This plays the counterpart of ->pre_destroy().
>
> When used in combination with the to-be-added generic descendant
> iterators, ->post_create() can be used to implement reliable state
> inheritance. It will be explained with the descendant iterators.
>
> v2: Added a paragraph about its future use w/ descendant iterators per
> Michal.
>
> v3: Forgot to add ->post_create() invocation to cgroup_load_subsys().
> Fixed.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Michal Hocko <mhocko@suse.cz>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Glauber Costa <glommer@parallels.com>
Li Zefan <lizefan@huawei.com
^ permalink raw reply [flat|nested] 74+ messages in thread
[parent not found: <20121108190715.GD9672-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH 1/9 v3] cgroup: add cgroup_subsys->post_create()
[not found] ` <20121108190715.GD9672-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-11-09 9:09 ` Li Zefan
2012-11-09 11:09 ` Daniel Wagner
1 sibling, 0 replies; 74+ messages in thread
From: Li Zefan @ 2012-11-09 9:09 UTC (permalink / raw)
To: Tejun Heo
Cc: mhocko-AlSwsSmVLrQ, rjw-KKrjLPT3xs0,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
Glauber Costa
On 2012/11/9 3:07, Tejun Heo wrote:
> Subject: cgroup: add cgroup_subsys->post_create()
>
> Currently, there's no way for a controller to find out whether a new
> cgroup finished all ->create() allocatinos successfully and is
> considered "live" by cgroup.
>
> This becomes a problem later when we add generic descendants walking
> to cgroup which can be used by controllers as controllers don't have a
> synchronization point where it can synchronize against new cgroups
> appearing in such walks.
>
> This patch adds ->post_create(). It's called after all ->create()
> succeeded and the cgroup is linked into the generic cgroup hierarchy.
> This plays the counterpart of ->pre_destroy().
>
> When used in combination with the to-be-added generic descendant
> iterators, ->post_create() can be used to implement reliable state
> inheritance. It will be explained with the descendant iterators.
>
> v2: Added a paragraph about its future use w/ descendant iterators per
> Michal.
>
> v3: Forgot to add ->post_create() invocation to cgroup_load_subsys().
> Fixed.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/9 v3] cgroup: add cgroup_subsys->post_create()
[not found] ` <20121108190715.GD9672-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-11-09 9:09 ` Li Zefan
@ 2012-11-09 11:09 ` Daniel Wagner
[not found] ` <509CE472.9040504-kQCPcA+X3s7YtjvyW6yDsg@public.gmane.org>
1 sibling, 1 reply; 74+ messages in thread
From: Daniel Wagner @ 2012-11-09 11:09 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
Glauber Costa
Hi Tejun,
On 08.11.2012 20:07, Tejun Heo wrote:> Subject: cgroup: add
cgroup_subsys->post_create()
>
> Currently, there's no way for a controller to find out whether a new
> cgroup finished all ->create() allocatinos successfully and is
> considered "live" by cgroup.
I'd like add hierarchy support to net_prio and the first thing to
do is to get rid of get_prioidx(). It looks like it would be nice to be
able to use use_id and post_create() for this but as I read the code
this might not work because the netdev might access resources allocated
between create() and post_create(). So my question is would it make
sense to move
cgroup_create():
if (ss->use_id) {
err = alloc_css_id(ss, parent, cgrp);
if (err)
goto err_destroy;
}
part before create() or add some protection between create() and
post_create() callback in net_prio. I have a patch but I see
I could drop it completely if post_create() is there.
cheers,
daniel
From 84fbbdf0dc5d3460389e39a00a3ee553ee55b563 Mon Sep 17 00:00:00 2001
From: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>
Date: Thu, 8 Nov 2012 17:17:21 +0100
Subject: [PATCH] cgroups: net_prio: Use IDR library to assign netprio index
get_prioidx() allocated a new id whenever it was called. put_prioidx()
gave an id back. get_pioidx() could can reallocate the id later on.
So that is exactly what IDR offers and so let's use it.
Signed-off-by: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>
---
net/core/netprio_cgroup.c | 51
+++++++++--------------------------------------
1 file changed, 9 insertions(+), 42 deletions(-)
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 847c02b..3c1b612 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -27,10 +27,7 @@
#include <linux/fdtable.h>
-#define PRIOIDX_SZ 128
-
-static unsigned long prioidx_map[PRIOIDX_SZ];
-static DEFINE_SPINLOCK(prioidx_map_lock);
+static DEFINE_IDA(netprio_ida);
static atomic_t max_prioidx = ATOMIC_INIT(0);
static inline struct cgroup_netprio_state *cgrp_netprio_state(struct
cgroup *cgrp)
@@ -39,34 +36,6 @@ static inline struct cgroup_netprio_state
*cgrp_netprio_state(struct cgroup *cgr
struct cgroup_netprio_state, css);
}
-static int get_prioidx(u32 *prio)
-{
- unsigned long flags;
- u32 prioidx;
-
- spin_lock_irqsave(&prioidx_map_lock, flags);
- prioidx = find_first_zero_bit(prioidx_map, sizeof(unsigned long) *
PRIOIDX_SZ);
- if (prioidx == sizeof(unsigned long) * PRIOIDX_SZ) {
- spin_unlock_irqrestore(&prioidx_map_lock, flags);
- return -ENOSPC;
- }
- set_bit(prioidx, prioidx_map);
- if (atomic_read(&max_prioidx) < prioidx)
- atomic_set(&max_prioidx, prioidx);
- spin_unlock_irqrestore(&prioidx_map_lock, flags);
- *prio = prioidx;
- return 0;
-}
-
-static void put_prioidx(u32 idx)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&prioidx_map_lock, flags);
- clear_bit(idx, prioidx_map);
- spin_unlock_irqrestore(&prioidx_map_lock, flags);
-}
-
static int extend_netdev_table(struct net_device *dev, u32 new_len)
{
size_t new_size = sizeof(struct netprio_map) +
@@ -120,9 +89,9 @@ static struct cgroup_subsys_state *cgrp_create(struct
cgroup *cgrp)
if (cgrp->parent && cgrp_netprio_state(cgrp->parent)->prioidx)
goto out;
- ret = get_prioidx(&cs->prioidx);
- if (ret < 0) {
- pr_warn("No space in priority index array\n");
+ cs->prioidx = ida_simple_get(&netprio_ida, 0, 0, GFP_KERNEL);
+ if (cs->prioidx < 0) {
+ ret = cs->prioidx;
goto out;
}
@@ -146,7 +115,7 @@ static void cgrp_destroy(struct cgroup *cgrp)
map->priomap[cs->prioidx] = 0;
}
rtnl_unlock();
- put_prioidx(cs->prioidx);
+ ida_simple_remove(&netprio_ida, cs->prioidx);
kfree(cs);
}
@@ -284,12 +253,10 @@ struct cgroup_subsys net_prio_subsys = {
.module = THIS_MODULE,
/*
- * net_prio has artificial limit on the number of cgroups and
- * disallows nesting making it impossible to co-mount it with other
- * hierarchical subsystems. Remove the artificially low PRIOIDX_SZ
- * limit and properly nest configuration such that children follow
- * their parents' configurations by default and are allowed to
- * override and remove the following.
+ * net_prio has artificial limit on properly nest
+ * configuration such that children follow their parents'
+ * configurations by default and are allowed to override and
+ * remove the following.
*/
.broken_hierarchy = true,
};
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH 2/9] cgroup: Use rculist ops for cgroup->children
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-03 8:38 ` [PATCH 1/9] cgroup: add cgroup_subsys->post_create() Tejun Heo
@ 2012-11-03 8:38 ` Tejun Heo
2012-11-07 15:30 ` Michal Hocko
[not found] ` <1351931915-1701-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-03 8:38 ` [PATCH 3/9] cgroup: implement generic child / descendant walk macros Tejun Heo
` (7 subsequent siblings)
9 siblings, 2 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
Use RCU safe list operations for cgroup->children. This will be used
to implement cgroup children / descendant walking which can be used by
controllers.
Note that cgroup_create() now puts a new cgroup at the end of the
->children list instead of head. This isn't strictly necessary but is
done so that the iteration order is more conventional.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
include/linux/cgroup.h | 1 +
kernel/cgroup.c | 8 +++-----
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b442122..90c33eb 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -12,6 +12,7 @@
#include <linux/cpumask.h>
#include <linux/nodemask.h>
#include <linux/rcupdate.h>
+#include <linux/rculist.h>
#include <linux/cgroupstats.h>
#include <linux/prio_heap.h>
#include <linux/rwsem.h>
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f05d992..cc5d2a0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1650,7 +1650,6 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
free_cg_links(&tmp_cg_links);
- BUG_ON(!list_empty(&root_cgrp->sibling));
BUG_ON(!list_empty(&root_cgrp->children));
BUG_ON(root->number_of_cgroups != 1);
@@ -1699,7 +1698,6 @@ static void cgroup_kill_sb(struct super_block *sb) {
BUG_ON(root->number_of_cgroups != 1);
BUG_ON(!list_empty(&cgrp->children));
- BUG_ON(!list_empty(&cgrp->sibling));
mutex_lock(&cgroup_mutex);
mutex_lock(&cgroup_root_mutex);
@@ -4053,7 +4051,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
}
}
- list_add(&cgrp->sibling, &cgrp->parent->children);
+ list_add_tail_rcu(&cgrp->sibling, &cgrp->parent->children);
root->number_of_cgroups++;
err = cgroup_create_dir(cgrp, dentry, mode);
@@ -4084,7 +4082,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
err_remove:
- list_del(&cgrp->sibling);
+ list_del_rcu(&cgrp->sibling);
root->number_of_cgroups--;
err_destroy:
@@ -4210,7 +4208,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
raw_spin_unlock(&release_list_lock);
/* delete this cgroup from parent->children */
- list_del_init(&cgrp->sibling);
+ list_del_rcu(&cgrp->sibling);
list_del_init(&cgrp->allcg_node);
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 2/9] cgroup: Use rculist ops for cgroup->children
2012-11-03 8:38 ` [PATCH 2/9] cgroup: Use rculist ops for cgroup->children Tejun Heo
@ 2012-11-07 15:30 ` Michal Hocko
[not found] ` <1351931915-1701-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2012-11-07 15:30 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
On Sat 03-11-12 01:38:28, Tejun Heo wrote:
> Use RCU safe list operations for cgroup->children. This will be used
> to implement cgroup children / descendant walking which can be used by
> controllers.
>
> Note that cgroup_create() now puts a new cgroup at the end of the
> ->children list instead of head. This isn't strictly necessary but is
> done so that the iteration order is more conventional.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
> ---
> include/linux/cgroup.h | 1 +
> kernel/cgroup.c | 8 +++-----
> 2 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index b442122..90c33eb 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -12,6 +12,7 @@
> #include <linux/cpumask.h>
> #include <linux/nodemask.h>
> #include <linux/rcupdate.h>
> +#include <linux/rculist.h>
> #include <linux/cgroupstats.h>
> #include <linux/prio_heap.h>
> #include <linux/rwsem.h>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index f05d992..cc5d2a0 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1650,7 +1650,6 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
>
> free_cg_links(&tmp_cg_links);
>
> - BUG_ON(!list_empty(&root_cgrp->sibling));
> BUG_ON(!list_empty(&root_cgrp->children));
> BUG_ON(root->number_of_cgroups != 1);
>
> @@ -1699,7 +1698,6 @@ static void cgroup_kill_sb(struct super_block *sb) {
>
> BUG_ON(root->number_of_cgroups != 1);
> BUG_ON(!list_empty(&cgrp->children));
> - BUG_ON(!list_empty(&cgrp->sibling));
>
> mutex_lock(&cgroup_mutex);
> mutex_lock(&cgroup_root_mutex);
> @@ -4053,7 +4051,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
> }
> }
>
> - list_add(&cgrp->sibling, &cgrp->parent->children);
> + list_add_tail_rcu(&cgrp->sibling, &cgrp->parent->children);
> root->number_of_cgroups++;
>
> err = cgroup_create_dir(cgrp, dentry, mode);
> @@ -4084,7 +4082,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
>
> err_remove:
>
> - list_del(&cgrp->sibling);
> + list_del_rcu(&cgrp->sibling);
> root->number_of_cgroups--;
>
> err_destroy:
> @@ -4210,7 +4208,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
> raw_spin_unlock(&release_list_lock);
>
> /* delete this cgroup from parent->children */
> - list_del_init(&cgrp->sibling);
> + list_del_rcu(&cgrp->sibling);
>
> list_del_init(&cgrp->allcg_node);
>
> --
> 1.7.11.7
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 74+ messages in thread
[parent not found: <1351931915-1701-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 2/9] cgroup: Use rculist ops for cgroup->children
[not found] ` <1351931915-1701-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 3:01 ` Kamezawa Hiroyuki
2012-11-09 9:10 ` Li Zefan
1 sibling, 0 replies; 74+ messages in thread
From: Kamezawa Hiroyuki @ 2012-11-08 3:01 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA
(2012/11/03 17:38), Tejun Heo wrote:
> Use RCU safe list operations for cgroup->children. This will be used
> to implement cgroup children / descendant walking which can be used by
> controllers.
>
> Note that cgroup_create() now puts a new cgroup at the end of the
> ->children list instead of head. This isn't strictly necessary but is
> done so that the iteration order is more conventional.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 2/9] cgroup: Use rculist ops for cgroup->children
[not found] ` <1351931915-1701-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-08 3:01 ` Kamezawa Hiroyuki
@ 2012-11-09 9:10 ` Li Zefan
1 sibling, 0 replies; 74+ messages in thread
From: Li Zefan @ 2012-11-09 9:10 UTC (permalink / raw)
To: Tejun Heo
Cc: rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, mhocko-AlSwsSmVLrQ,
cgroups-u79uwXL29TY76Z2rM5mHXA
On 2012/11/3 16:38, Tejun Heo wrote:
> Use RCU safe list operations for cgroup->children. This will be used
> to implement cgroup children / descendant walking which can be used by
> controllers.
>
> Note that cgroup_create() now puts a new cgroup at the end of the
> ->children list instead of head. This isn't strictly necessary but is
> done so that the iteration order is more conventional.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 3/9] cgroup: implement generic child / descendant walk macros
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-03 8:38 ` [PATCH 1/9] cgroup: add cgroup_subsys->post_create() Tejun Heo
2012-11-03 8:38 ` [PATCH 2/9] cgroup: Use rculist ops for cgroup->children Tejun Heo
@ 2012-11-03 8:38 ` Tejun Heo
2012-11-06 20:31 ` Tejun Heo
` (3 more replies)
2012-11-03 8:38 ` [PATCH 4/9] cgroup_freezer: trivial cleanups Tejun Heo
` (6 subsequent siblings)
9 siblings, 4 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
Currently, cgroup doesn't provide any generic helper for walking a
given cgroup's children or descendants. This patch adds the following
three macros.
* cgroup_for_each_child() - walk immediate children of a cgroup.
* cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
in pre-order tree traversal.
* cgroup_for_each_descendant_post() - visit all descendants of a
cgroup in post-order tree traversal.
All three only require the user to hold RCU read lock during
traversal. Verifying that each iterated cgroup is online is the
responsibility of the user. When used with proper synchronization,
cgroup_for_each_descendant_pre() can be used to propagate config
updates to descendants in reliable way. See comments for details.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
include/linux/cgroup.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++
kernel/cgroup.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 168 insertions(+)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 90c33eb..0020329 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -534,6 +534,88 @@ static inline struct cgroup* task_cgroup(struct task_struct *task,
return task_subsys_state(task, subsys_id)->cgroup;
}
+/**
+ * cgroup_for_each_child - iterate through children of a cgroup
+ * @pos: the cgroup * to use as the loop cursor
+ * @cgroup: cgroup whose children to walk
+ *
+ * Walk @cgroup's children. Must be called under rcu_read_lock(). A child
+ * cgroup which hasn't finished ->post_create() or already has finished
+ * ->pre_destroy() may show up during traversal and it's each subsystem's
+ * responsibility to verify that each @pos is alive.
+ *
+ * If a subsystem synchronizes against the parent in its ->post_create()
+ * and before starting iterating, a cgroup which finished ->post_create()
+ * is guaranteed to be visible in the future iterations.
+ */
+#define cgroup_for_each_child(pos, cgroup) \
+ list_for_each_entry_rcu(pos, &(cgroup)->children, sibling)
+
+struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
+ struct cgroup *cgroup);
+
+/**
+ * cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants
+ * @pos: the cgroup * to use as the loop cursor
+ * @cgroup: cgroup whose descendants to walk
+ *
+ * Walk @cgroup's descendants. Must be called under rcu_read_lock(). A
+ * descendant cgroup which hasn't finished ->post_create() or already has
+ * finished ->pre_destroy() may show up during traversal and it's each
+ * subsystem's responsibility to verify that each @pos is alive.
+ *
+ * If a subsystem synchronizes against the parent in its ->post_create()
+ * and before starting iterating, and synchronizes against @pos on each
+ * iteration, any descendant cgroup which finished ->post_create() is
+ * guaranteed to be visible in the future iterations.
+ *
+ * In other words, the following guarantees that a descendant can't escape
+ * configuration of its ancestors.
+ *
+ * my_post_create(@cgrp)
+ * {
+ * Lock @cgrp->parent and @cgrp;
+ * Inherit config from @cgrp->parent;
+ * Unlock both.
+ * }
+ *
+ * my_update_config(@cgrp)
+ * {
+ * Lock @cgrp;
+ * Update @cgrp's config;
+ * Unlock @cgrp;
+ *
+ * cgroup_for_each_descendant_pre(@pos, @cgrp) {
+ * Lock @pos;
+ * Verify @pos is alive and inherit config from @pos->parent;
+ * Unlock @pos;
+ * }
+ * }
+ *
+ * Alternatively, a subsystem may choose to use a single global lock to
+ * synchronize ->post_create() and ->pre_destroy() against tree-walking
+ * operations.
+ */
+#define cgroup_for_each_descendant_pre(pos, cgroup) \
+ for (pos = cgroup_next_descendant_pre(NULL, (cgroup)); (pos); \
+ pos = cgroup_next_descendant_pre((pos), (cgroup)))
+
+struct cgroup *cgroup_next_descendant_post(struct cgroup *pos,
+ struct cgroup *cgroup);
+
+/**
+ * cgroup_for_each_descendant_post - post-order walk of a cgroup's descendants
+ * @pos: the cgroup * to use as the loop cursor
+ * @cgroup: cgroup whose descendants to walk
+ *
+ * Similar to cgroup_for_each_descendant_pre() but performs post-order
+ * traversal instead. Note that the walk visibility guarantee described in
+ * pre-order walk doesn't apply the same to post-order walks.
+ */
+#define cgroup_for_each_descendant_post(pos, cgroup) \
+ for (pos = cgroup_next_descendant_post(NULL, (cgroup)); (pos); \
+ pos = cgroup_next_descendant_post((pos), (cgroup)))
+
/* A cgroup_iter should be treated as an opaque object */
struct cgroup_iter {
struct list_head *cg_link;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index cc5d2a0..8bd662c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2985,6 +2985,92 @@ static void cgroup_enable_task_cg_lists(void)
write_unlock(&css_set_lock);
}
+/**
+ * cgroup_next_descendant_pre - find the next descendant for pre-order walk
+ * @pos: the current position (%NULL to initiate traversal)
+ * @cgroup: cgroup whose descendants to walk
+ *
+ * To be used by cgroup_for_each_descendant_pre(). Find the next
+ * descendant to visit for pre-order traversal of @cgroup's descendants.
+ */
+struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
+ struct cgroup *cgroup)
+{
+ struct cgroup *next;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ /* if first iteration, pretend we just visited @cgroup */
+ if (!pos) {
+ if (list_empty(&cgroup->children))
+ return NULL;
+ pos = cgroup;
+ }
+
+ /* visit the first child if exists */
+ next = list_first_or_null_rcu(&pos->children, struct cgroup, sibling);
+ if (next)
+ return next;
+
+ /* no child, visit my or the closest ancestor's next sibling */
+ do {
+ next = list_entry_rcu(pos->sibling.next, struct cgroup,
+ sibling);
+ if (&next->sibling != &pos->parent->children)
+ return next;
+
+ pos = pos->parent;
+ } while (pos != cgroup);
+
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(cgroup_next_descendant_pre);
+
+static struct cgroup *cgroup_leftmost_descendant(struct cgroup *pos)
+{
+ struct cgroup *last;
+
+ do {
+ last = pos;
+ pos = list_first_or_null_rcu(&pos->children, struct cgroup,
+ sibling);
+ } while (pos);
+
+ return last;
+}
+
+/**
+ * cgroup_next_descendant_post - find the next descendant for post-order walk
+ * @pos: the current position (%NULL to initiate traversal)
+ * @cgroup: cgroup whose descendants to walk
+ *
+ * To be used by cgroup_for_each_descendant_post(). Find the next
+ * descendant to visit for post-order traversal of @cgroup's descendants.
+ */
+struct cgroup *cgroup_next_descendant_post(struct cgroup *pos,
+ struct cgroup *cgroup)
+{
+ struct cgroup *next;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ /* if first iteration, visit the leftmost descendant */
+ if (!pos) {
+ next = cgroup_leftmost_descendant(cgroup);
+ return next != cgroup ? next : NULL;
+ }
+
+ /* if there's an unvisited sibling, visit its leftmost descendant */
+ next = list_entry_rcu(pos->sibling.next, struct cgroup, sibling);
+ if (&next->sibling != &pos->parent->children)
+ return cgroup_leftmost_descendant(next);
+
+ /* no sibling left, visit parent */
+ next = pos->parent;
+ return next != cgroup ? next : NULL;
+}
+EXPORT_SYMBOL_GPL(cgroup_next_descendant_post);
+
void cgroup_iter_start(struct cgroup *cgrp, struct cgroup_iter *it)
__acquires(css_set_lock)
{
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
2012-11-03 8:38 ` [PATCH 3/9] cgroup: implement generic child / descendant walk macros Tejun Heo
@ 2012-11-06 20:31 ` Tejun Heo
[not found] ` <20121106203154.GV30069-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2012-11-07 16:54 ` Michal Hocko
` (2 subsequent siblings)
3 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2012-11-06 20:31 UTC (permalink / raw)
To: lizefan, mhocko, rjw
Cc: containers, cgroups, linux-kernel, linux-pm, fweisbec
On Sat, Nov 03, 2012 at 01:38:29AM -0700, Tejun Heo wrote:
> Currently, cgroup doesn't provide any generic helper for walking a
> given cgroup's children or descendants. This patch adds the following
> three macros.
>
> * cgroup_for_each_child() - walk immediate children of a cgroup.
>
> * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
> in pre-order tree traversal.
>
> * cgroup_for_each_descendant_post() - visit all descendants of a
> cgroup in post-order tree traversal.
>
> All three only require the user to hold RCU read lock during
> traversal. Verifying that each iterated cgroup is online is the
> responsibility of the user. When used with proper synchronization,
> cgroup_for_each_descendant_pre() can be used to propagate config
> updates to descendants in reliable way. See comments for details.
Michal, Li, how does this look to you? Would this be okay for memcg
too? Li, do you think the comment on cgroup_for_each_descendant_pre()
is correct?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
2012-11-03 8:38 ` [PATCH 3/9] cgroup: implement generic child / descendant walk macros Tejun Heo
2012-11-06 20:31 ` Tejun Heo
@ 2012-11-07 16:54 ` Michal Hocko
[not found] ` <20121107165457.GD4131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
[not found] ` <1351931915-1701-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-08 17:59 ` [PATCH 3/9 v2] " Tejun Heo
3 siblings, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2012-11-07 16:54 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
On Sat 03-11-12 01:38:29, Tejun Heo wrote:
[...]
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index cc5d2a0..8bd662c 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2985,6 +2985,92 @@ static void cgroup_enable_task_cg_lists(void)
> write_unlock(&css_set_lock);
> }
>
> +/**
> + * cgroup_next_descendant_pre - find the next descendant for pre-order walk
> + * @pos: the current position (%NULL to initiate traversal)
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * To be used by cgroup_for_each_descendant_pre(). Find the next
> + * descendant to visit for pre-order traversal of @cgroup's descendants.
> + */
> +struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
> + struct cgroup *cgroup)
> +{
> + struct cgroup *next;
> +
> + WARN_ON_ONCE(!rcu_read_lock_held());
> +
> + /* if first iteration, pretend we just visited @cgroup */
> + if (!pos) {
> + if (list_empty(&cgroup->children))
> + return NULL;
> + pos = cgroup;
> + }
Is there any specific reason why the root of the tree is excluded?
This is bit impractical because you have to special case the root
in the code.
> +
> + /* visit the first child if exists */
> + next = list_first_or_null_rcu(&pos->children, struct cgroup, sibling);
> + if (next)
> + return next;
> +
> + /* no child, visit my or the closest ancestor's next sibling */
> + do {
> + next = list_entry_rcu(pos->sibling.next, struct cgroup,
> + sibling);
> + if (&next->sibling != &pos->parent->children)
> + return next;
> +
> + pos = pos->parent;
> + } while (pos != cgroup);
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_next_descendant_pre);
[...]
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 74+ messages in thread
[parent not found: <1351931915-1701-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
[not found] ` <1351931915-1701-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 3:21 ` Kamezawa Hiroyuki
2012-11-08 9:50 ` Michal Hocko
1 sibling, 0 replies; 74+ messages in thread
From: Kamezawa Hiroyuki @ 2012-11-08 3:21 UTC (permalink / raw)
To: Tejun Heo
Cc: rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, mhocko-AlSwsSmVLrQ,
cgroups-u79uwXL29TY76Z2rM5mHXA
(2012/11/03 17:38), Tejun Heo wrote:
> Currently, cgroup doesn't provide any generic helper for walking a
> given cgroup's children or descendants. This patch adds the following
> three macros.
>
> * cgroup_for_each_child() - walk immediate children of a cgroup.
>
> * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
> in pre-order tree traversal.
>
> * cgroup_for_each_descendant_post() - visit all descendants of a
> cgroup in post-order tree traversal.
>
> All three only require the user to hold RCU read lock during
> traversal. Verifying that each iterated cgroup is online is the
> responsibility of the user. When used with proper synchronization,
> cgroup_for_each_descendant_pre() can be used to propagate config
> updates to descendants in reliable way. See comments for details.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
maybe better than using css->id in some(many) case.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-c2VE3kR2zFovtab9mdV7tw@public.gmane.org>
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
[not found] ` <1351931915-1701-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-08 3:21 ` Kamezawa Hiroyuki
@ 2012-11-08 9:50 ` Michal Hocko
2012-11-08 17:15 ` Tejun Heo
1 sibling, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2012-11-08 9:50 UTC (permalink / raw)
To: Tejun Heo
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
cgroups-u79uwXL29TY76Z2rM5mHXA
On Sat 03-11-12 01:38:29, Tejun Heo wrote:
> Currently, cgroup doesn't provide any generic helper for walking a
> given cgroup's children or descendants. This patch adds the following
> three macros.
>
> * cgroup_for_each_child() - walk immediate children of a cgroup.
>
> * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
> in pre-order tree traversal.
>
> * cgroup_for_each_descendant_post() - visit all descendants of a
> cgroup in post-order tree traversal.
>
> All three only require the user to hold RCU read lock during
> traversal. Verifying that each iterated cgroup is online is the
> responsibility of the user. When used with proper synchronization,
> cgroup_for_each_descendant_pre() can be used to propagate config
> updates to descendants in reliable way. See comments for details.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
I will convert mem_cgroup_iter to use this rather than css_get_next
after this gets into the next tree so that it can fly via Andrew.
Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Just a minor knit. You are talking about a config propagation while I
would consider state propagation more clear and less confusing. Config
is usually stable enough so that post_create is not necessary for
syncing (e.g. memcg.swappiness). It is a state which must be consistent
throughout the hierarchy which matters here.
Thanks!
> ---
> include/linux/cgroup.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++
> kernel/cgroup.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 168 insertions(+)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 90c33eb..0020329 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -534,6 +534,88 @@ static inline struct cgroup* task_cgroup(struct task_struct *task,
> return task_subsys_state(task, subsys_id)->cgroup;
> }
>
> +/**
> + * cgroup_for_each_child - iterate through children of a cgroup
> + * @pos: the cgroup * to use as the loop cursor
> + * @cgroup: cgroup whose children to walk
> + *
> + * Walk @cgroup's children. Must be called under rcu_read_lock(). A child
> + * cgroup which hasn't finished ->post_create() or already has finished
> + * ->pre_destroy() may show up during traversal and it's each subsystem's
> + * responsibility to verify that each @pos is alive.
> + *
> + * If a subsystem synchronizes against the parent in its ->post_create()
> + * and before starting iterating, a cgroup which finished ->post_create()
> + * is guaranteed to be visible in the future iterations.
> + */
> +#define cgroup_for_each_child(pos, cgroup) \
> + list_for_each_entry_rcu(pos, &(cgroup)->children, sibling)
> +
> +struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
> + struct cgroup *cgroup);
> +
> +/**
> + * cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants
> + * @pos: the cgroup * to use as the loop cursor
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * Walk @cgroup's descendants. Must be called under rcu_read_lock(). A
> + * descendant cgroup which hasn't finished ->post_create() or already has
> + * finished ->pre_destroy() may show up during traversal and it's each
> + * subsystem's responsibility to verify that each @pos is alive.
> + *
> + * If a subsystem synchronizes against the parent in its ->post_create()
> + * and before starting iterating, and synchronizes against @pos on each
> + * iteration, any descendant cgroup which finished ->post_create() is
> + * guaranteed to be visible in the future iterations.
> + *
> + * In other words, the following guarantees that a descendant can't escape
> + * configuration of its ancestors.
> + *
> + * my_post_create(@cgrp)
> + * {
> + * Lock @cgrp->parent and @cgrp;
> + * Inherit config from @cgrp->parent;
> + * Unlock both.
> + * }
> + *
> + * my_update_config(@cgrp)
> + * {
> + * Lock @cgrp;
> + * Update @cgrp's config;
> + * Unlock @cgrp;
> + *
> + * cgroup_for_each_descendant_pre(@pos, @cgrp) {
> + * Lock @pos;
> + * Verify @pos is alive and inherit config from @pos->parent;
> + * Unlock @pos;
> + * }
> + * }
> + *
> + * Alternatively, a subsystem may choose to use a single global lock to
> + * synchronize ->post_create() and ->pre_destroy() against tree-walking
> + * operations.
> + */
> +#define cgroup_for_each_descendant_pre(pos, cgroup) \
> + for (pos = cgroup_next_descendant_pre(NULL, (cgroup)); (pos); \
> + pos = cgroup_next_descendant_pre((pos), (cgroup)))
> +
> +struct cgroup *cgroup_next_descendant_post(struct cgroup *pos,
> + struct cgroup *cgroup);
> +
> +/**
> + * cgroup_for_each_descendant_post - post-order walk of a cgroup's descendants
> + * @pos: the cgroup * to use as the loop cursor
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * Similar to cgroup_for_each_descendant_pre() but performs post-order
> + * traversal instead. Note that the walk visibility guarantee described in
> + * pre-order walk doesn't apply the same to post-order walks.
> + */
> +#define cgroup_for_each_descendant_post(pos, cgroup) \
> + for (pos = cgroup_next_descendant_post(NULL, (cgroup)); (pos); \
> + pos = cgroup_next_descendant_post((pos), (cgroup)))
> +
> /* A cgroup_iter should be treated as an opaque object */
> struct cgroup_iter {
> struct list_head *cg_link;
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index cc5d2a0..8bd662c 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2985,6 +2985,92 @@ static void cgroup_enable_task_cg_lists(void)
> write_unlock(&css_set_lock);
> }
>
> +/**
> + * cgroup_next_descendant_pre - find the next descendant for pre-order walk
> + * @pos: the current position (%NULL to initiate traversal)
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * To be used by cgroup_for_each_descendant_pre(). Find the next
> + * descendant to visit for pre-order traversal of @cgroup's descendants.
> + */
> +struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
> + struct cgroup *cgroup)
> +{
> + struct cgroup *next;
> +
> + WARN_ON_ONCE(!rcu_read_lock_held());
> +
> + /* if first iteration, pretend we just visited @cgroup */
> + if (!pos) {
> + if (list_empty(&cgroup->children))
> + return NULL;
> + pos = cgroup;
> + }
> +
> + /* visit the first child if exists */
> + next = list_first_or_null_rcu(&pos->children, struct cgroup, sibling);
> + if (next)
> + return next;
> +
> + /* no child, visit my or the closest ancestor's next sibling */
> + do {
> + next = list_entry_rcu(pos->sibling.next, struct cgroup,
> + sibling);
> + if (&next->sibling != &pos->parent->children)
> + return next;
> +
> + pos = pos->parent;
> + } while (pos != cgroup);
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_next_descendant_pre);
> +
> +static struct cgroup *cgroup_leftmost_descendant(struct cgroup *pos)
> +{
> + struct cgroup *last;
> +
> + do {
> + last = pos;
> + pos = list_first_or_null_rcu(&pos->children, struct cgroup,
> + sibling);
> + } while (pos);
> +
> + return last;
> +}
> +
> +/**
> + * cgroup_next_descendant_post - find the next descendant for post-order walk
> + * @pos: the current position (%NULL to initiate traversal)
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * To be used by cgroup_for_each_descendant_post(). Find the next
> + * descendant to visit for post-order traversal of @cgroup's descendants.
> + */
> +struct cgroup *cgroup_next_descendant_post(struct cgroup *pos,
> + struct cgroup *cgroup)
> +{
> + struct cgroup *next;
> +
> + WARN_ON_ONCE(!rcu_read_lock_held());
> +
> + /* if first iteration, visit the leftmost descendant */
> + if (!pos) {
> + next = cgroup_leftmost_descendant(cgroup);
> + return next != cgroup ? next : NULL;
> + }
> +
> + /* if there's an unvisited sibling, visit its leftmost descendant */
> + next = list_entry_rcu(pos->sibling.next, struct cgroup, sibling);
> + if (&next->sibling != &pos->parent->children)
> + return cgroup_leftmost_descendant(next);
> +
> + /* no sibling left, visit parent */
> + next = pos->parent;
> + return next != cgroup ? next : NULL;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_next_descendant_post);
> +
> void cgroup_iter_start(struct cgroup *cgrp, struct cgroup_iter *it)
> __acquires(css_set_lock)
> {
> --
> 1.7.11.7
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
2012-11-08 9:50 ` Michal Hocko
@ 2012-11-08 17:15 ` Tejun Heo
0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-08 17:15 UTC (permalink / raw)
To: Michal Hocko
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
Hey, Michal.
On Thu, Nov 08, 2012 at 10:50:13AM +0100, Michal Hocko wrote:
> On Sat 03-11-12 01:38:29, Tejun Heo wrote:
> > Currently, cgroup doesn't provide any generic helper for walking a
> > given cgroup's children or descendants. This patch adds the following
> > three macros.
> >
> > * cgroup_for_each_child() - walk immediate children of a cgroup.
> >
> > * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
> > in pre-order tree traversal.
> >
> > * cgroup_for_each_descendant_post() - visit all descendants of a
> > cgroup in post-order tree traversal.
> >
> > All three only require the user to hold RCU read lock during
> > traversal. Verifying that each iterated cgroup is online is the
> > responsibility of the user. When used with proper synchronization,
> > cgroup_for_each_descendant_pre() can be used to propagate config
> > updates to descendants in reliable way. See comments for details.
> >
> > Signed-off-by: Tejun Heo <tj@kernel.org>
>
> I will convert mem_cgroup_iter to use this rather than css_get_next
> after this gets into the next tree so that it can fly via Andrew.
>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
>
> Just a minor knit. You are talking about a config propagation while I
> would consider state propagation more clear and less confusing. Config
> is usually stable enough so that post_create is not necessary for
> syncing (e.g. memcg.swappiness). It is a state which must be consistent
> throughout the hierarchy which matters here.
Did s/config/state/g
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 3/9 v2] cgroup: implement generic child / descendant walk macros
2012-11-03 8:38 ` [PATCH 3/9] cgroup: implement generic child / descendant walk macros Tejun Heo
` (2 preceding siblings ...)
[not found] ` <1351931915-1701-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 17:59 ` Tejun Heo
[not found] ` <20121108175946.GA9672-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
3 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2012-11-08 17:59 UTC (permalink / raw)
To: lizefan, mhocko, rjw
Cc: containers, cgroups, linux-kernel, linux-pm, fweisbec
Currently, cgroup doesn't provide any generic helper for walking a
given cgroup's children or descendants. This patch adds the following
three macros.
* cgroup_for_each_child() - walk immediate children of a cgroup.
* cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
in pre-order tree traversal.
* cgroup_for_each_descendant_post() - visit all descendants of a
cgroup in post-order tree traversal.
All three only require the user to hold RCU read lock during
traversal. Verifying that each iterated cgroup is online is the
responsibility of the user. When used with proper synchronization,
cgroup_for_each_descendant_pre() can be used to propagate state
updates to descendants in reliable way. See comments for details.
v2: s/config/state/ in commit message and comments per Michal. More
documentation on synchronization rules.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujisu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
---
include/linux/cgroup.h | 94 +++++++++++++++++++++++++++++++++++++++++++++++++
kernel/cgroup.c | 86 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 180 insertions(+)
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -534,6 +534,100 @@ static inline struct cgroup* task_cgroup
return task_subsys_state(task, subsys_id)->cgroup;
}
+/**
+ * cgroup_for_each_child - iterate through children of a cgroup
+ * @pos: the cgroup * to use as the loop cursor
+ * @cgroup: cgroup whose children to walk
+ *
+ * Walk @cgroup's children. Must be called under rcu_read_lock(). A child
+ * cgroup which hasn't finished ->post_create() or already has finished
+ * ->pre_destroy() may show up during traversal and it's each subsystem's
+ * responsibility to verify that each @pos is alive.
+ *
+ * If a subsystem synchronizes against the parent in its ->post_create()
+ * and before starting iterating, a cgroup which finished ->post_create()
+ * is guaranteed to be visible in the future iterations.
+ */
+#define cgroup_for_each_child(pos, cgroup) \
+ list_for_each_entry_rcu(pos, &(cgroup)->children, sibling)
+
+struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
+ struct cgroup *cgroup);
+
+/**
+ * cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants
+ * @pos: the cgroup * to use as the loop cursor
+ * @cgroup: cgroup whose descendants to walk
+ *
+ * Walk @cgroup's descendants. Must be called under rcu_read_lock(). A
+ * descendant cgroup which hasn't finished ->post_create() or already has
+ * finished ->pre_destroy() may show up during traversal and it's each
+ * subsystem's responsibility to verify that each @pos is alive.
+ *
+ * If a subsystem synchronizes against the parent in its ->post_create()
+ * and before starting iterating, and synchronizes against @pos on each
+ * iteration, any descendant cgroup which finished ->post_create() is
+ * guaranteed to be visible in the future iterations.
+ *
+ * In other words, the following guarantees that a descendant can't escape
+ * state updates of its ancestors.
+ *
+ * my_post_create(@cgrp)
+ * {
+ * Lock @cgrp->parent and @cgrp;
+ * Inherit state from @cgrp->parent;
+ * Unlock both.
+ * }
+ *
+ * my_update_state(@cgrp)
+ * {
+ * Lock @cgrp;
+ * Update @cgrp's state;
+ * Unlock @cgrp;
+ *
+ * cgroup_for_each_descendant_pre(@pos, @cgrp) {
+ * Lock @pos;
+ * Verify @pos is alive and inherit state from @pos->parent;
+ * Unlock @pos;
+ * }
+ * }
+ *
+ * As long as the inheriting step, including checking the parent state, is
+ * enclosed inside @pos locking, double-locking the parent isn't necessary
+ * while inheriting. The state update to the parent is guaranteed to be
+ * visible by walking order and, as long as inheriting operations to the
+ * same @pos are atomic to each other, multiple updates racing each other
+ * still result in the correct state. It's guaranateed that at least one
+ * inheritance happens for any cgroup after the latest update to its
+ * parent.
+ *
+ * If checking parent's state requires locking the parent, each inheriting
+ * iteration should lock and unlock both @pos->parent and @pos.
+ *
+ * Alternatively, a subsystem may choose to use a single global lock to
+ * synchronize ->post_create() and ->pre_destroy() against tree-walking
+ * operations.
+ */
+#define cgroup_for_each_descendant_pre(pos, cgroup) \
+ for (pos = cgroup_next_descendant_pre(NULL, (cgroup)); (pos); \
+ pos = cgroup_next_descendant_pre((pos), (cgroup)))
+
+struct cgroup *cgroup_next_descendant_post(struct cgroup *pos,
+ struct cgroup *cgroup);
+
+/**
+ * cgroup_for_each_descendant_post - post-order walk of a cgroup's descendants
+ * @pos: the cgroup * to use as the loop cursor
+ * @cgroup: cgroup whose descendants to walk
+ *
+ * Similar to cgroup_for_each_descendant_pre() but performs post-order
+ * traversal instead. Note that the walk visibility guarantee described in
+ * pre-order walk doesn't apply the same to post-order walks.
+ */
+#define cgroup_for_each_descendant_post(pos, cgroup) \
+ for (pos = cgroup_next_descendant_post(NULL, (cgroup)); (pos); \
+ pos = cgroup_next_descendant_post((pos), (cgroup)))
+
/* A cgroup_iter should be treated as an opaque object */
struct cgroup_iter {
struct list_head *cg_link;
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2984,6 +2984,92 @@ static void cgroup_enable_task_cg_lists(
write_unlock(&css_set_lock);
}
+/**
+ * cgroup_next_descendant_pre - find the next descendant for pre-order walk
+ * @pos: the current position (%NULL to initiate traversal)
+ * @cgroup: cgroup whose descendants to walk
+ *
+ * To be used by cgroup_for_each_descendant_pre(). Find the next
+ * descendant to visit for pre-order traversal of @cgroup's descendants.
+ */
+struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
+ struct cgroup *cgroup)
+{
+ struct cgroup *next;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ /* if first iteration, pretend we just visited @cgroup */
+ if (!pos) {
+ if (list_empty(&cgroup->children))
+ return NULL;
+ pos = cgroup;
+ }
+
+ /* visit the first child if exists */
+ next = list_first_or_null_rcu(&pos->children, struct cgroup, sibling);
+ if (next)
+ return next;
+
+ /* no child, visit my or the closest ancestor's next sibling */
+ do {
+ next = list_entry_rcu(pos->sibling.next, struct cgroup,
+ sibling);
+ if (&next->sibling != &pos->parent->children)
+ return next;
+
+ pos = pos->parent;
+ } while (pos != cgroup);
+
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(cgroup_next_descendant_pre);
+
+static struct cgroup *cgroup_leftmost_descendant(struct cgroup *pos)
+{
+ struct cgroup *last;
+
+ do {
+ last = pos;
+ pos = list_first_or_null_rcu(&pos->children, struct cgroup,
+ sibling);
+ } while (pos);
+
+ return last;
+}
+
+/**
+ * cgroup_next_descendant_post - find the next descendant for post-order walk
+ * @pos: the current position (%NULL to initiate traversal)
+ * @cgroup: cgroup whose descendants to walk
+ *
+ * To be used by cgroup_for_each_descendant_post(). Find the next
+ * descendant to visit for post-order traversal of @cgroup's descendants.
+ */
+struct cgroup *cgroup_next_descendant_post(struct cgroup *pos,
+ struct cgroup *cgroup)
+{
+ struct cgroup *next;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ /* if first iteration, visit the leftmost descendant */
+ if (!pos) {
+ next = cgroup_leftmost_descendant(cgroup);
+ return next != cgroup ? next : NULL;
+ }
+
+ /* if there's an unvisited sibling, visit its leftmost descendant */
+ next = list_entry_rcu(pos->sibling.next, struct cgroup, sibling);
+ if (&next->sibling != &pos->parent->children)
+ return cgroup_leftmost_descendant(next);
+
+ /* no sibling left, visit parent */
+ next = pos->parent;
+ return next != cgroup ? next : NULL;
+}
+EXPORT_SYMBOL_GPL(cgroup_next_descendant_post);
+
void cgroup_iter_start(struct cgroup *cgrp, struct cgroup_iter *it)
__acquires(css_set_lock)
{
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 4/9] cgroup_freezer: trivial cleanups
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (2 preceding siblings ...)
2012-11-03 8:38 ` [PATCH 3/9] cgroup: implement generic child / descendant walk macros Tejun Heo
@ 2012-11-03 8:38 ` Tejun Heo
[not found] ` <1351931915-1701-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-03 8:38 ` [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support Tejun Heo
` (5 subsequent siblings)
9 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
* Clean-up indentation and line-breaks. Drop the invalid comment
about freezer->lock.
* Make all internal functions take @freezer instead of both @cgroup
and @freezer.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup_freezer.c | 41 +++++++++++++++++++----------------------
1 file changed, 19 insertions(+), 22 deletions(-)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index bedefd9..975b3d8 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -29,17 +29,15 @@ enum freezer_state {
};
struct freezer {
- struct cgroup_subsys_state css;
- enum freezer_state state;
- spinlock_t lock; /* protects _writes_ to state */
+ struct cgroup_subsys_state css;
+ enum freezer_state state;
+ spinlock_t lock;
};
-static inline struct freezer *cgroup_freezer(
- struct cgroup *cgroup)
+static inline struct freezer *cgroup_freezer(struct cgroup *cgroup)
{
- return container_of(
- cgroup_subsys_state(cgroup, freezer_subsys_id),
- struct freezer, css);
+ return container_of(cgroup_subsys_state(cgroup, freezer_subsys_id),
+ struct freezer, css);
}
static inline struct freezer *task_freezer(struct task_struct *task)
@@ -180,8 +178,9 @@ out:
* migrated into or out of @cgroup, so we can't verify task states against
* @freezer state here. See freezer_attach() for details.
*/
-static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
+static void update_if_frozen(struct freezer *freezer)
{
+ struct cgroup *cgroup = freezer->css.cgroup;
struct cgroup_iter it;
struct task_struct *task;
@@ -211,12 +210,11 @@ notyet:
static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
struct seq_file *m)
{
- struct freezer *freezer;
+ struct freezer *freezer = cgroup_freezer(cgroup);
enum freezer_state state;
- freezer = cgroup_freezer(cgroup);
spin_lock_irq(&freezer->lock);
- update_if_frozen(cgroup, freezer);
+ update_if_frozen(freezer);
state = freezer->state;
spin_unlock_irq(&freezer->lock);
@@ -225,8 +223,9 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
return 0;
}
-static void freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
+static void freeze_cgroup(struct freezer *freezer)
{
+ struct cgroup *cgroup = freezer->css.cgroup;
struct cgroup_iter it;
struct task_struct *task;
@@ -236,8 +235,9 @@ static void freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
cgroup_iter_end(cgroup, &it);
}
-static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
+static void unfreeze_cgroup(struct freezer *freezer)
{
+ struct cgroup *cgroup = freezer->css.cgroup;
struct cgroup_iter it;
struct task_struct *task;
@@ -247,11 +247,9 @@ static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
cgroup_iter_end(cgroup, &it);
}
-static void freezer_change_state(struct cgroup *cgroup,
+static void freezer_change_state(struct freezer *freezer,
enum freezer_state goal_state)
{
- struct freezer *freezer = cgroup_freezer(cgroup);
-
/* also synchronizes against task migration, see freezer_attach() */
spin_lock_irq(&freezer->lock);
@@ -260,13 +258,13 @@ static void freezer_change_state(struct cgroup *cgroup,
if (freezer->state != CGROUP_THAWED)
atomic_dec(&system_freezing_cnt);
freezer->state = CGROUP_THAWED;
- unfreeze_cgroup(cgroup, freezer);
+ unfreeze_cgroup(freezer);
break;
case CGROUP_FROZEN:
if (freezer->state == CGROUP_THAWED)
atomic_inc(&system_freezing_cnt);
freezer->state = CGROUP_FREEZING;
- freeze_cgroup(cgroup, freezer);
+ freeze_cgroup(freezer);
break;
default:
BUG();
@@ -275,8 +273,7 @@ static void freezer_change_state(struct cgroup *cgroup,
spin_unlock_irq(&freezer->lock);
}
-static int freezer_write(struct cgroup *cgroup,
- struct cftype *cft,
+static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
const char *buffer)
{
enum freezer_state goal_state;
@@ -288,7 +285,7 @@ static int freezer_write(struct cgroup *cgroup,
else
return -EINVAL;
- freezer_change_state(cgroup, goal_state);
+ freezer_change_state(cgroup_freezer(cgroup), goal_state);
return 0;
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (3 preceding siblings ...)
2012-11-03 8:38 ` [PATCH 4/9] cgroup_freezer: trivial cleanups Tejun Heo
@ 2012-11-03 8:38 ` Tejun Heo
[not found] ` <1351931915-1701-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-08 9:56 ` Michal Hocko
2012-11-03 8:38 ` [PATCH 6/9] cgroup_freezer: make freezer->state mask of flags Tejun Heo
` (4 subsequent siblings)
9 siblings, 2 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
* Make freezer_change_state() take bool @freeze instead of enum
freezer_state.
* Separate out freezer_apply_state() out of freezer_change_state().
This makes freezer_change_state() a rather silly thin wrapper. It
will be filled with hierarchy handling later on.
This patch doesn't introduce any behavior change.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup_freezer.c | 48 ++++++++++++++++++++++++++++++------------------
1 file changed, 30 insertions(+), 18 deletions(-)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 975b3d8..2690830 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -247,45 +247,57 @@ static void unfreeze_cgroup(struct freezer *freezer)
cgroup_iter_end(cgroup, &it);
}
-static void freezer_change_state(struct freezer *freezer,
- enum freezer_state goal_state)
+/**
+ * freezer_apply_state - apply state change to a single cgroup_freezer
+ * @freezer: freezer to apply state change to
+ * @freeze: whether to freeze or unfreeze
+ */
+static void freezer_apply_state(struct freezer *freezer, bool freeze)
{
/* also synchronizes against task migration, see freezer_attach() */
- spin_lock_irq(&freezer->lock);
+ lockdep_assert_held(&freezer->lock);
- switch (goal_state) {
- case CGROUP_THAWED:
- if (freezer->state != CGROUP_THAWED)
- atomic_dec(&system_freezing_cnt);
- freezer->state = CGROUP_THAWED;
- unfreeze_cgroup(freezer);
- break;
- case CGROUP_FROZEN:
+ if (freeze) {
if (freezer->state == CGROUP_THAWED)
atomic_inc(&system_freezing_cnt);
freezer->state = CGROUP_FREEZING;
freeze_cgroup(freezer);
- break;
- default:
- BUG();
+ } else {
+ if (freezer->state != CGROUP_THAWED)
+ atomic_dec(&system_freezing_cnt);
+ freezer->state = CGROUP_THAWED;
+ unfreeze_cgroup(freezer);
}
+}
+/**
+ * freezer_change_state - change the freezing state of a cgroup_freezer
+ * @freezer: freezer of interest
+ * @freeze: whether to freeze or thaw
+ *
+ * Freeze or thaw @cgroup according to @freeze.
+ */
+static void freezer_change_state(struct freezer *freezer, bool freeze)
+{
+ /* update @freezer */
+ spin_lock_irq(&freezer->lock);
+ freezer_apply_state(freezer, freeze);
spin_unlock_irq(&freezer->lock);
}
static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
const char *buffer)
{
- enum freezer_state goal_state;
+ bool freeze;
if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
- goal_state = CGROUP_THAWED;
+ freeze = false;
else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
- goal_state = CGROUP_FROZEN;
+ freeze = true;
else
return -EINVAL;
- freezer_change_state(cgroup_freezer(cgroup), goal_state);
+ freezer_change_state(cgroup_freezer(cgroup), freeze);
return 0;
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
[parent not found: <1351931915-1701-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support
[not found] ` <1351931915-1701-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 4:25 ` Kamezawa Hiroyuki
0 siblings, 0 replies; 74+ messages in thread
From: Kamezawa Hiroyuki @ 2012-11-08 4:25 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA
(2012/11/03 17:38), Tejun Heo wrote:
> * Make freezer_change_state() take bool @freeze instead of enum
> freezer_state.
>
> * Separate out freezer_apply_state() out of freezer_change_state().
> This makes freezer_change_state() a rather silly thin wrapper. It
> will be filled with hierarchy handling later on.
>
> This patch doesn't introduce any behavior change.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support
2012-11-03 8:38 ` [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support Tejun Heo
[not found] ` <1351931915-1701-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 9:56 ` Michal Hocko
1 sibling, 0 replies; 74+ messages in thread
From: Michal Hocko @ 2012-11-08 9:56 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
On Sat 03-11-12 01:38:31, Tejun Heo wrote:
> * Make freezer_change_state() take bool @freeze instead of enum
> freezer_state.
>
> * Separate out freezer_apply_state() out of freezer_change_state().
> This makes freezer_change_state() a rather silly thin wrapper. It
> will be filled with hierarchy handling later on.
>
> This patch doesn't introduce any behavior change.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Makes sense
Reviewed-by: Michal Hocko <mhocko@suse.cz>
> ---
> kernel/cgroup_freezer.c | 48 ++++++++++++++++++++++++++++++------------------
> 1 file changed, 30 insertions(+), 18 deletions(-)
>
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 975b3d8..2690830 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -247,45 +247,57 @@ static void unfreeze_cgroup(struct freezer *freezer)
> cgroup_iter_end(cgroup, &it);
> }
>
> -static void freezer_change_state(struct freezer *freezer,
> - enum freezer_state goal_state)
> +/**
> + * freezer_apply_state - apply state change to a single cgroup_freezer
> + * @freezer: freezer to apply state change to
> + * @freeze: whether to freeze or unfreeze
> + */
> +static void freezer_apply_state(struct freezer *freezer, bool freeze)
> {
> /* also synchronizes against task migration, see freezer_attach() */
> - spin_lock_irq(&freezer->lock);
> + lockdep_assert_held(&freezer->lock);
>
> - switch (goal_state) {
> - case CGROUP_THAWED:
> - if (freezer->state != CGROUP_THAWED)
> - atomic_dec(&system_freezing_cnt);
> - freezer->state = CGROUP_THAWED;
> - unfreeze_cgroup(freezer);
> - break;
> - case CGROUP_FROZEN:
> + if (freeze) {
> if (freezer->state == CGROUP_THAWED)
> atomic_inc(&system_freezing_cnt);
> freezer->state = CGROUP_FREEZING;
> freeze_cgroup(freezer);
> - break;
> - default:
> - BUG();
> + } else {
> + if (freezer->state != CGROUP_THAWED)
> + atomic_dec(&system_freezing_cnt);
> + freezer->state = CGROUP_THAWED;
> + unfreeze_cgroup(freezer);
> }
> +}
>
> +/**
> + * freezer_change_state - change the freezing state of a cgroup_freezer
> + * @freezer: freezer of interest
> + * @freeze: whether to freeze or thaw
> + *
> + * Freeze or thaw @cgroup according to @freeze.
> + */
> +static void freezer_change_state(struct freezer *freezer, bool freeze)
> +{
> + /* update @freezer */
> + spin_lock_irq(&freezer->lock);
> + freezer_apply_state(freezer, freeze);
> spin_unlock_irq(&freezer->lock);
> }
>
> static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
> const char *buffer)
> {
> - enum freezer_state goal_state;
> + bool freeze;
>
> if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
> - goal_state = CGROUP_THAWED;
> + freeze = false;
> else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
> - goal_state = CGROUP_FROZEN;
> + freeze = true;
> else
> return -EINVAL;
>
> - freezer_change_state(cgroup_freezer(cgroup), goal_state);
> + freezer_change_state(cgroup_freezer(cgroup), freeze);
> return 0;
> }
>
> --
> 1.7.11.7
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 6/9] cgroup_freezer: make freezer->state mask of flags
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (4 preceding siblings ...)
2012-11-03 8:38 ` [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support Tejun Heo
@ 2012-11-03 8:38 ` Tejun Heo
[not found] ` <1351931915-1701-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-03 8:38 ` [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT] Tejun Heo
` (3 subsequent siblings)
9 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
freezer->state was an enum value - one of THAWED, FREEZING and FROZEN.
As the scheduled full hierarchy support requires more than one
freezing condition, switch it to mask of flags. If FREEZING is not
set, it's thawed. FREEZING is set if freezing or frozen. If frozen,
both FREEZING and FROZEN are set. Now that tasks can be attached to
an already frozen cgroup, this also makes freezing condition checks
more natural.
This patch doesn't introduce any behavior change.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup_freezer.c | 60 ++++++++++++++++++++++---------------------------
1 file changed, 27 insertions(+), 33 deletions(-)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 2690830..e76aa9f 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -22,15 +22,14 @@
#include <linux/freezer.h>
#include <linux/seq_file.h>
-enum freezer_state {
- CGROUP_THAWED = 0,
- CGROUP_FREEZING,
- CGROUP_FROZEN,
+enum freezer_state_flags {
+ CGROUP_FREEZING = (1 << 1), /* this freezer is freezing */
+ CGROUP_FROZEN = (1 << 3), /* this and its descendants frozen */
};
struct freezer {
struct cgroup_subsys_state css;
- enum freezer_state state;
+ unsigned int state;
spinlock_t lock;
};
@@ -48,12 +47,10 @@ static inline struct freezer *task_freezer(struct task_struct *task)
bool cgroup_freezing(struct task_struct *task)
{
- enum freezer_state state;
bool ret;
rcu_read_lock();
- state = task_freezer(task)->state;
- ret = state == CGROUP_FREEZING || state == CGROUP_FROZEN;
+ ret = task_freezer(task)->state & CGROUP_FREEZING;
rcu_read_unlock();
return ret;
@@ -63,10 +60,13 @@ bool cgroup_freezing(struct task_struct *task)
* cgroups_write_string() limits the size of freezer state strings to
* CGROUP_LOCAL_BUFFER_SIZE
*/
-static const char *freezer_state_strs[] = {
- "THAWED",
- "FREEZING",
- "FROZEN",
+static const char *freezer_state_strs(unsigned int state)
+{
+ if (state & CGROUP_FROZEN)
+ return "FROZEN";
+ if (state & CGROUP_FREEZING)
+ return "FREEZING";
+ return "THAWED";
};
/*
@@ -91,7 +91,6 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
return ERR_PTR(-ENOMEM);
spin_lock_init(&freezer->lock);
- freezer->state = CGROUP_THAWED;
return &freezer->css;
}
@@ -99,7 +98,7 @@ static void freezer_destroy(struct cgroup *cgroup)
{
struct freezer *freezer = cgroup_freezer(cgroup);
- if (freezer->state != CGROUP_THAWED)
+ if (freezer->state & CGROUP_FREEZING)
atomic_dec(&system_freezing_cnt);
kfree(freezer);
}
@@ -129,15 +128,13 @@ static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
* Tasks in @tset are on @new_cgrp but may not conform to its
* current state before executing the following - !frozen tasks may
* be visible in a FROZEN cgroup and frozen tasks in a THAWED one.
- * This means that, to determine whether to freeze, one should test
- * whether the state equals THAWED.
*/
cgroup_taskset_for_each(task, new_cgrp, tset) {
- if (freezer->state == CGROUP_THAWED) {
+ if (!(freezer->state & CGROUP_FREEZING)) {
__thaw_task(task);
} else {
freeze_task(task);
- freezer->state = CGROUP_FREEZING;
+ freezer->state &= ~CGROUP_FROZEN;
}
}
@@ -159,11 +156,7 @@ static void freezer_fork(struct task_struct *task)
goto out;
spin_lock_irq(&freezer->lock);
- /*
- * @task might have been just migrated into a FROZEN cgroup. Test
- * equality with THAWED. Read the comment in freezer_attach().
- */
- if (freezer->state != CGROUP_THAWED)
+ if (freezer->state & CGROUP_FREEZING)
freeze_task(task);
spin_unlock_irq(&freezer->lock);
out:
@@ -184,7 +177,8 @@ static void update_if_frozen(struct freezer *freezer)
struct cgroup_iter it;
struct task_struct *task;
- if (freezer->state != CGROUP_FREEZING)
+ if (!(freezer->state & CGROUP_FREEZING) ||
+ (freezer->state & CGROUP_FROZEN))
return;
cgroup_iter_start(cgroup, &it);
@@ -202,7 +196,7 @@ static void update_if_frozen(struct freezer *freezer)
}
}
- freezer->state = CGROUP_FROZEN;
+ freezer->state |= CGROUP_FROZEN;
notyet:
cgroup_iter_end(cgroup, &it);
}
@@ -211,14 +205,14 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
struct seq_file *m)
{
struct freezer *freezer = cgroup_freezer(cgroup);
- enum freezer_state state;
+ unsigned int state;
spin_lock_irq(&freezer->lock);
update_if_frozen(freezer);
state = freezer->state;
spin_unlock_irq(&freezer->lock);
- seq_puts(m, freezer_state_strs[state]);
+ seq_puts(m, freezer_state_strs(state));
seq_putc(m, '\n');
return 0;
}
@@ -258,14 +252,14 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze)
lockdep_assert_held(&freezer->lock);
if (freeze) {
- if (freezer->state == CGROUP_THAWED)
+ if (!(freezer->state & CGROUP_FREEZING))
atomic_inc(&system_freezing_cnt);
- freezer->state = CGROUP_FREEZING;
+ freezer->state |= CGROUP_FREEZING;
freeze_cgroup(freezer);
} else {
- if (freezer->state != CGROUP_THAWED)
+ if (freezer->state & CGROUP_FREEZING)
atomic_dec(&system_freezing_cnt);
- freezer->state = CGROUP_THAWED;
+ freezer->state &= ~(CGROUP_FREEZING | CGROUP_FROZEN);
unfreeze_cgroup(freezer);
}
}
@@ -290,9 +284,9 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
{
bool freeze;
- if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
+ if (strcmp(buffer, freezer_state_strs(0)) == 0)
freeze = false;
- else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
+ else if (strcmp(buffer, freezer_state_strs(CGROUP_FROZEN)) == 0)
freeze = true;
else
return -EINVAL;
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (5 preceding siblings ...)
2012-11-03 8:38 ` [PATCH 6/9] cgroup_freezer: make freezer->state mask of flags Tejun Heo
@ 2012-11-03 8:38 ` Tejun Heo
[not found] ` <1351931915-1701-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-08 12:47 ` Michal Hocko
2012-11-03 8:38 ` [PATCH 8/9] cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state Tejun Heo
` (2 subsequent siblings)
9 siblings, 2 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
the two flags. This is to prepare for full hierarchy support.
freezer_apply_date() is updated such that it can handle setting and
clearing of both flags. The two flags are also exposed to userland
via read-only files self_freezing and parent_freezing.
Other than the added cgroupfs files, this patch doesn't introduce any
behavior change.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup_freezer.c | 55 ++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 47 insertions(+), 8 deletions(-)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e76aa9f..b8ad93c 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -23,8 +23,12 @@
#include <linux/seq_file.h>
enum freezer_state_flags {
- CGROUP_FREEZING = (1 << 1), /* this freezer is freezing */
+ CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
+ CGROUP_FREEZING_PARENT = (1 << 2), /* the parent freezer is freezing */
CGROUP_FROZEN = (1 << 3), /* this and its descendants frozen */
+
+ /* mask for all FREEZING flags */
+ CGROUP_FREEZING = CGROUP_FREEZING_SELF | CGROUP_FREEZING_PARENT,
};
struct freezer {
@@ -245,8 +249,13 @@ static void unfreeze_cgroup(struct freezer *freezer)
* freezer_apply_state - apply state change to a single cgroup_freezer
* @freezer: freezer to apply state change to
* @freeze: whether to freeze or unfreeze
+ * @state: CGROUP_FREEZING_* flag to set or clear
+ *
+ * Set or clear @state on @cgroup according to @freeze, and perform
+ * freezing or thawing as necessary.
*/
-static void freezer_apply_state(struct freezer *freezer, bool freeze)
+static void freezer_apply_state(struct freezer *freezer, bool freeze,
+ unsigned int state)
{
/* also synchronizes against task migration, see freezer_attach() */
lockdep_assert_held(&freezer->lock);
@@ -254,13 +263,19 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze)
if (freeze) {
if (!(freezer->state & CGROUP_FREEZING))
atomic_inc(&system_freezing_cnt);
- freezer->state |= CGROUP_FREEZING;
+ freezer->state |= state;
freeze_cgroup(freezer);
} else {
- if (freezer->state & CGROUP_FREEZING)
- atomic_dec(&system_freezing_cnt);
- freezer->state &= ~(CGROUP_FREEZING | CGROUP_FROZEN);
- unfreeze_cgroup(freezer);
+ bool was_freezing = freezer->state & CGROUP_FREEZING;
+
+ freezer->state &= ~state;
+
+ if (!(freezer->state & CGROUP_FREEZING)) {
+ if (was_freezing)
+ atomic_dec(&system_freezing_cnt);
+ freezer->state &= ~CGROUP_FROZEN;
+ unfreeze_cgroup(freezer);
+ }
}
}
@@ -275,7 +290,7 @@ static void freezer_change_state(struct freezer *freezer, bool freeze)
{
/* update @freezer */
spin_lock_irq(&freezer->lock);
- freezer_apply_state(freezer, freeze);
+ freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
spin_unlock_irq(&freezer->lock);
}
@@ -295,6 +310,20 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
return 0;
}
+static u64 freezer_self_freezing_read(struct cgroup *cgroup, struct cftype *cft)
+{
+ struct freezer *freezer = cgroup_freezer(cgroup);
+
+ return (bool)(freezer->state & CGROUP_FREEZING_SELF);
+}
+
+static u64 freezer_parent_freezing_read(struct cgroup *cgroup, struct cftype *cft)
+{
+ struct freezer *freezer = cgroup_freezer(cgroup);
+
+ return (bool)(freezer->state & CGROUP_FREEZING_PARENT);
+}
+
static struct cftype files[] = {
{
.name = "state",
@@ -302,6 +331,16 @@ static struct cftype files[] = {
.read_seq_string = freezer_read,
.write_string = freezer_write,
},
+ {
+ .name = "self_freezing",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .read_u64 = freezer_self_freezing_read,
+ },
+ {
+ .name = "parent_freezing",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .read_u64 = freezer_parent_freezing_read,
+ },
{ } /* terminate */
};
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
[parent not found: <1351931915-1701-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
[not found] ` <1351931915-1701-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 4:42 ` Kamezawa Hiroyuki
[not found] ` <509B382E.4030707-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
0 siblings, 1 reply; 74+ messages in thread
From: Kamezawa Hiroyuki @ 2012-11-08 4:42 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA
(2012/11/03 17:38), Tejun Heo wrote:
> Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
> the two flags. This is to prepare for full hierarchy support.
>
> freezer_apply_date() is updated such that it can handle setting and
> clearing of both flags. The two flags are also exposed to userland
> via read-only files self_freezing and parent_freezing.
>
> Other than the added cgroupfs files, this patch doesn't introduce any
> behavior change.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
What is the purpose of FREEZING_SELG/PARENT, having 2 flags and
make it visible to users ?
Thanks,
-Kame
> ---
> kernel/cgroup_freezer.c | 55 ++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index e76aa9f..b8ad93c 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -23,8 +23,12 @@
> #include <linux/seq_file.h>
>
> enum freezer_state_flags {
> - CGROUP_FREEZING = (1 << 1), /* this freezer is freezing */
> + CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
> + CGROUP_FREEZING_PARENT = (1 << 2), /* the parent freezer is freezing */
> CGROUP_FROZEN = (1 << 3), /* this and its descendants frozen */
> +
> + /* mask for all FREEZING flags */
> + CGROUP_FREEZING = CGROUP_FREEZING_SELF | CGROUP_FREEZING_PARENT,
> };
>
> struct freezer {
> @@ -245,8 +249,13 @@ static void unfreeze_cgroup(struct freezer *freezer)
> * freezer_apply_state - apply state change to a single cgroup_freezer
> * @freezer: freezer to apply state change to
> * @freeze: whether to freeze or unfreeze
> + * @state: CGROUP_FREEZING_* flag to set or clear
> + *
> + * Set or clear @state on @cgroup according to @freeze, and perform
> + * freezing or thawing as necessary.
> */
> -static void freezer_apply_state(struct freezer *freezer, bool freeze)
> +static void freezer_apply_state(struct freezer *freezer, bool freeze,
> + unsigned int state)
> {
> /* also synchronizes against task migration, see freezer_attach() */
> lockdep_assert_held(&freezer->lock);
> @@ -254,13 +263,19 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze)
> if (freeze) {
> if (!(freezer->state & CGROUP_FREEZING))
> atomic_inc(&system_freezing_cnt);
> - freezer->state |= CGROUP_FREEZING;
> + freezer->state |= state;
> freeze_cgroup(freezer);
> } else {
> - if (freezer->state & CGROUP_FREEZING)
> - atomic_dec(&system_freezing_cnt);
> - freezer->state &= ~(CGROUP_FREEZING | CGROUP_FROZEN);
> - unfreeze_cgroup(freezer);
> + bool was_freezing = freezer->state & CGROUP_FREEZING;
> +
> + freezer->state &= ~state;
> +
> + if (!(freezer->state & CGROUP_FREEZING)) {
> + if (was_freezing)
> + atomic_dec(&system_freezing_cnt);
> + freezer->state &= ~CGROUP_FROZEN;
> + unfreeze_cgroup(freezer);
> + }
> }
> }
>
> @@ -275,7 +290,7 @@ static void freezer_change_state(struct freezer *freezer, bool freeze)
> {
> /* update @freezer */
> spin_lock_irq(&freezer->lock);
> - freezer_apply_state(freezer, freeze);
> + freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
> spin_unlock_irq(&freezer->lock);
> }
>
> @@ -295,6 +310,20 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
> return 0;
> }
>
> +static u64 freezer_self_freezing_read(struct cgroup *cgroup, struct cftype *cft)
> +{
> + struct freezer *freezer = cgroup_freezer(cgroup);
> +
> + return (bool)(freezer->state & CGROUP_FREEZING_SELF);
> +}
> +
> +static u64 freezer_parent_freezing_read(struct cgroup *cgroup, struct cftype *cft)
> +{
> + struct freezer *freezer = cgroup_freezer(cgroup);
> +
> + return (bool)(freezer->state & CGROUP_FREEZING_PARENT);
> +}
> +
> static struct cftype files[] = {
> {
> .name = "state",
> @@ -302,6 +331,16 @@ static struct cftype files[] = {
> .read_seq_string = freezer_read,
> .write_string = freezer_write,
> },
> + {
> + .name = "self_freezing",
> + .flags = CFTYPE_NOT_ON_ROOT,
> + .read_u64 = freezer_self_freezing_read,
> + },
> + {
> + .name = "parent_freezing",
> + .flags = CFTYPE_NOT_ON_ROOT,
> + .read_u64 = freezer_parent_freezing_read,
> + },
> { } /* terminate */
> };
>
>
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
2012-11-03 8:38 ` [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT] Tejun Heo
[not found] ` <1351931915-1701-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 12:47 ` Michal Hocko
2012-11-08 14:42 ` Tejun Heo
1 sibling, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2012-11-08 12:47 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
On Sat 03-11-12 01:38:33, Tejun Heo wrote:
> Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
> the two flags. This is to prepare for full hierarchy support.
>
> freezer_apply_date() is updated such that it can handle setting and
> clearing of both flags. The two flags are also exposed to userland
> via read-only files self_freezing and parent_freezing.
>
> Other than the added cgroupfs files, this patch doesn't introduce any
> behavior change.
OK, I can imagine that this might be useful to find the first parent
group that needs to be thawed to unfreeze the group I am interested
in. Could you also update Documentation/cgroups/freezer-subsystem.txt to
clarify the intended usage?
Minor nit. Same as mentioned in the previous patch freezer_apply_state
should get enum freezer_state_flags state parameter.
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
> ---
> kernel/cgroup_freezer.c | 55 ++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index e76aa9f..b8ad93c 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -23,8 +23,12 @@
> #include <linux/seq_file.h>
>
> enum freezer_state_flags {
> - CGROUP_FREEZING = (1 << 1), /* this freezer is freezing */
> + CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
> + CGROUP_FREEZING_PARENT = (1 << 2), /* the parent freezer is freezing */
> CGROUP_FROZEN = (1 << 3), /* this and its descendants frozen */
> +
> + /* mask for all FREEZING flags */
> + CGROUP_FREEZING = CGROUP_FREEZING_SELF | CGROUP_FREEZING_PARENT,
> };
>
> struct freezer {
> @@ -245,8 +249,13 @@ static void unfreeze_cgroup(struct freezer *freezer)
> * freezer_apply_state - apply state change to a single cgroup_freezer
> * @freezer: freezer to apply state change to
> * @freeze: whether to freeze or unfreeze
> + * @state: CGROUP_FREEZING_* flag to set or clear
> + *
> + * Set or clear @state on @cgroup according to @freeze, and perform
> + * freezing or thawing as necessary.
> */
> -static void freezer_apply_state(struct freezer *freezer, bool freeze)
> +static void freezer_apply_state(struct freezer *freezer, bool freeze,
> + unsigned int state)
> {
> /* also synchronizes against task migration, see freezer_attach() */
> lockdep_assert_held(&freezer->lock);
> @@ -254,13 +263,19 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze)
> if (freeze) {
> if (!(freezer->state & CGROUP_FREEZING))
> atomic_inc(&system_freezing_cnt);
> - freezer->state |= CGROUP_FREEZING;
> + freezer->state |= state;
> freeze_cgroup(freezer);
> } else {
> - if (freezer->state & CGROUP_FREEZING)
> - atomic_dec(&system_freezing_cnt);
> - freezer->state &= ~(CGROUP_FREEZING | CGROUP_FROZEN);
> - unfreeze_cgroup(freezer);
> + bool was_freezing = freezer->state & CGROUP_FREEZING;
> +
> + freezer->state &= ~state;
> +
> + if (!(freezer->state & CGROUP_FREEZING)) {
> + if (was_freezing)
> + atomic_dec(&system_freezing_cnt);
> + freezer->state &= ~CGROUP_FROZEN;
> + unfreeze_cgroup(freezer);
> + }
> }
> }
>
> @@ -275,7 +290,7 @@ static void freezer_change_state(struct freezer *freezer, bool freeze)
> {
> /* update @freezer */
> spin_lock_irq(&freezer->lock);
> - freezer_apply_state(freezer, freeze);
> + freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
> spin_unlock_irq(&freezer->lock);
> }
>
> @@ -295,6 +310,20 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
> return 0;
> }
>
> +static u64 freezer_self_freezing_read(struct cgroup *cgroup, struct cftype *cft)
> +{
> + struct freezer *freezer = cgroup_freezer(cgroup);
> +
> + return (bool)(freezer->state & CGROUP_FREEZING_SELF);
> +}
> +
> +static u64 freezer_parent_freezing_read(struct cgroup *cgroup, struct cftype *cft)
> +{
> + struct freezer *freezer = cgroup_freezer(cgroup);
> +
> + return (bool)(freezer->state & CGROUP_FREEZING_PARENT);
> +}
> +
> static struct cftype files[] = {
> {
> .name = "state",
> @@ -302,6 +331,16 @@ static struct cftype files[] = {
> .read_seq_string = freezer_read,
> .write_string = freezer_write,
> },
> + {
> + .name = "self_freezing",
> + .flags = CFTYPE_NOT_ON_ROOT,
> + .read_u64 = freezer_self_freezing_read,
> + },
> + {
> + .name = "parent_freezing",
> + .flags = CFTYPE_NOT_ON_ROOT,
> + .read_u64 = freezer_parent_freezing_read,
> + },
> { } /* terminate */
> };
>
> --
> 1.7.11.7
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
2012-11-08 12:47 ` Michal Hocko
@ 2012-11-08 14:42 ` Tejun Heo
0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-08 14:42 UTC (permalink / raw)
To: Michal Hocko
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
Hello,
On Thu, Nov 08, 2012 at 01:47:55PM +0100, Michal Hocko wrote:
> On Sat 03-11-12 01:38:33, Tejun Heo wrote:
> > Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
> > the two flags. This is to prepare for full hierarchy support.
> >
> > freezer_apply_date() is updated such that it can handle setting and
> > clearing of both flags. The two flags are also exposed to userland
> > via read-only files self_freezing and parent_freezing.
> >
> > Other than the added cgroupfs files, this patch doesn't introduce any
> > behavior change.
>
> OK, I can imagine that this might be useful to find the first parent
> group that needs to be thawed to unfreeze the group I am interested
> in. Could you also update Documentation/cgroups/freezer-subsystem.txt to
> clarify the intended usage?
Ooh, right. Will update the doc in the last patch.
> Minor nit. Same as mentioned in the previous patch freezer_apply_state
> should get enum freezer_state_flags state parameter.
But it isn't an enum value.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 8/9] cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (6 preceding siblings ...)
2012-11-03 8:38 ` [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT] Tejun Heo
@ 2012-11-03 8:38 ` Tejun Heo
[not found] ` <1351931915-1701-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-08 13:23 ` Michal Hocko
2012-11-03 8:38 ` [PATCH 9/9] cgroup_freezer: implement proper hierarchy support Tejun Heo
2012-11-09 17:15 ` [PATCHSET cgroup/for-3.8] " Tejun Heo
9 siblings, 2 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
A cgroup is online and visible to iteration between ->post_create()
and ->pre_destroy(). This patch introduces CGROUP_FREEZER_ONLINE and
toggles it from the newly added freezer_post_create() and
freezer_pre_destroy() while holding freezer->lock such that a
cgroup_freezer can be reilably distinguished to be online. This will
be used by full hierarchy support.
ONLINE test is added to freezer_apply_state() but it currently doesn't
make any difference as freezer_write() can only be called for an
online cgroup.
Adjusting system_freezing_cnt on destruction is moved from
freezer_destroy() to the new freezer_pre_destroy() for consistency.
This patch doesn't introduce any noticeable behavior change.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup_freezer.c | 42 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index b8ad93c..4f12d31 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -23,6 +23,7 @@
#include <linux/seq_file.h>
enum freezer_state_flags {
+ CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */
CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
CGROUP_FREEZING_PARENT = (1 << 2), /* the parent freezer is freezing */
CGROUP_FROZEN = (1 << 3), /* this and its descendants frozen */
@@ -98,13 +99,45 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
return &freezer->css;
}
-static void freezer_destroy(struct cgroup *cgroup)
+/**
+ * freezer_post_create - commit creation of a freezer cgroup
+ * @cgroup: cgroup being created
+ *
+ * We're committing to creation of @cgroup. Mark it online.
+ */
+static void freezer_post_create(struct cgroup *cgroup)
{
struct freezer *freezer = cgroup_freezer(cgroup);
+ spin_lock_irq(&freezer->lock);
+ freezer->state |= CGROUP_FREEZER_ONLINE;
+ spin_unlock_irq(&freezer->lock);
+}
+
+/**
+ * freezer_pre_destroy - initiate destruction of @cgroup
+ * @cgroup: cgroup being destroyed
+ *
+ * @cgroup is going away. Mark it dead and decrement system_freezing_count
+ * if it was holding one.
+ */
+static void freezer_pre_destroy(struct cgroup *cgroup)
+{
+ struct freezer *freezer = cgroup_freezer(cgroup);
+
+ spin_lock_irq(&freezer->lock);
+
if (freezer->state & CGROUP_FREEZING)
atomic_dec(&system_freezing_cnt);
- kfree(freezer);
+
+ freezer->state = 0;
+
+ spin_unlock_irq(&freezer->lock);
+}
+
+static void freezer_destroy(struct cgroup *cgroup)
+{
+ kfree(cgroup_freezer(cgroup));
}
/*
@@ -260,6 +293,9 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze,
/* also synchronizes against task migration, see freezer_attach() */
lockdep_assert_held(&freezer->lock);
+ if (!(freezer->state & CGROUP_FREEZER_ONLINE))
+ return;
+
if (freeze) {
if (!(freezer->state & CGROUP_FREEZING))
atomic_inc(&system_freezing_cnt);
@@ -347,6 +383,8 @@ static struct cftype files[] = {
struct cgroup_subsys freezer_subsys = {
.name = "freezer",
.create = freezer_create,
+ .post_create = freezer_post_create,
+ .pre_destroy = freezer_pre_destroy,
.destroy = freezer_destroy,
.subsys_id = freezer_subsys_id,
.attach = freezer_attach,
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
[parent not found: <1351931915-1701-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 8/9] cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state
[not found] ` <1351931915-1701-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 4:48 ` Kamezawa Hiroyuki
[not found] ` <509B3999.6060505-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
0 siblings, 1 reply; 74+ messages in thread
From: Kamezawa Hiroyuki @ 2012-11-08 4:48 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w
(2012/11/03 17:38), Tejun Heo wrote:
> A cgroup is online and visible to iteration between ->post_create()
> and ->pre_destroy(). This patch introduces CGROUP_FREEZER_ONLINE and
> toggles it from the newly added freezer_post_create() and
> freezer_pre_destroy() while holding freezer->lock such that a
> cgroup_freezer can be reilably distinguished to be online. This will
> be used by full hierarchy support.
>
> ONLINE test is added to freezer_apply_state() but it currently doesn't
> make any difference as freezer_write() can only be called for an
> online cgroup.
>
> Adjusting system_freezing_cnt on destruction is moved from
> freezer_destroy() to the new freezer_pre_destroy() for consistency.
>
> This patch doesn't introduce any noticeable behavior change.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> kernel/cgroup_freezer.c | 42 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index b8ad93c..4f12d31 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -23,6 +23,7 @@
> #include <linux/seq_file.h>
>
> enum freezer_state_flags {
> + CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */
Could you explain what 'online' means here again, rather than changelog ?
BTW, 'online' is a shared concept, between post_create() and pre_destroy(), among
developpers ? Is it new ?
Anyway, the patch itself is simple.
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 8/9] cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state
2012-11-03 8:38 ` [PATCH 8/9] cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state Tejun Heo
[not found] ` <1351931915-1701-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-08 13:23 ` Michal Hocko
[not found] ` <20121108132306.GH31821-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
1 sibling, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2012-11-08 13:23 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
On Sat 03-11-12 01:38:34, Tejun Heo wrote:
> A cgroup is online and visible to iteration between ->post_create()
> and ->pre_destroy(). This patch introduces CGROUP_FREEZER_ONLINE and
> toggles it from the newly added freezer_post_create() and
> freezer_pre_destroy() while holding freezer->lock such that a
> cgroup_freezer can be reilably distinguished to be online. This will
> be used by full hierarchy support.
I am thinking whether freezer_pre_destroy is really needed. Once we
reach pre_destroy then there are no tasks nor any children in the group
so there is nobody to wake up if the group was frozen and the destroy
callback is called after synchronize_rcu so the traversing should be
safe.
> ONLINE test is added to freezer_apply_state() but it currently doesn't
> make any difference as freezer_write() can only be called for an
> online cgroup.
>
> Adjusting system_freezing_cnt on destruction is moved from
> freezer_destroy() to the new freezer_pre_destroy() for consistency.
>
> This patch doesn't introduce any noticeable behavior change.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
> ---
> kernel/cgroup_freezer.c | 42 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index b8ad93c..4f12d31 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -23,6 +23,7 @@
> #include <linux/seq_file.h>
>
> enum freezer_state_flags {
> + CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */
> CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
> CGROUP_FREEZING_PARENT = (1 << 2), /* the parent freezer is freezing */
> CGROUP_FROZEN = (1 << 3), /* this and its descendants frozen */
> @@ -98,13 +99,45 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
> return &freezer->css;
> }
>
> -static void freezer_destroy(struct cgroup *cgroup)
> +/**
> + * freezer_post_create - commit creation of a freezer cgroup
> + * @cgroup: cgroup being created
> + *
> + * We're committing to creation of @cgroup. Mark it online.
> + */
> +static void freezer_post_create(struct cgroup *cgroup)
> {
> struct freezer *freezer = cgroup_freezer(cgroup);
>
> + spin_lock_irq(&freezer->lock);
> + freezer->state |= CGROUP_FREEZER_ONLINE;
> + spin_unlock_irq(&freezer->lock);
> +}
> +
> +/**
> + * freezer_pre_destroy - initiate destruction of @cgroup
> + * @cgroup: cgroup being destroyed
> + *
> + * @cgroup is going away. Mark it dead and decrement system_freezing_count
> + * if it was holding one.
> + */
> +static void freezer_pre_destroy(struct cgroup *cgroup)
> +{
> + struct freezer *freezer = cgroup_freezer(cgroup);
> +
> + spin_lock_irq(&freezer->lock);
> +
> if (freezer->state & CGROUP_FREEZING)
> atomic_dec(&system_freezing_cnt);
> - kfree(freezer);
> +
> + freezer->state = 0;
> +
> + spin_unlock_irq(&freezer->lock);
> +}
> +
> +static void freezer_destroy(struct cgroup *cgroup)
> +{
> + kfree(cgroup_freezer(cgroup));
> }
>
> /*
> @@ -260,6 +293,9 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze,
> /* also synchronizes against task migration, see freezer_attach() */
> lockdep_assert_held(&freezer->lock);
>
> + if (!(freezer->state & CGROUP_FREEZER_ONLINE))
> + return;
> +
> if (freeze) {
> if (!(freezer->state & CGROUP_FREEZING))
> atomic_inc(&system_freezing_cnt);
> @@ -347,6 +383,8 @@ static struct cftype files[] = {
> struct cgroup_subsys freezer_subsys = {
> .name = "freezer",
> .create = freezer_create,
> + .post_create = freezer_post_create,
> + .pre_destroy = freezer_pre_destroy,
> .destroy = freezer_destroy,
> .subsys_id = freezer_subsys_id,
> .attach = freezer_attach,
> --
> 1.7.11.7
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 9/9] cgroup_freezer: implement proper hierarchy support
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (7 preceding siblings ...)
2012-11-03 8:38 ` [PATCH 8/9] cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state Tejun Heo
@ 2012-11-03 8:38 ` Tejun Heo
2012-11-07 11:00 ` Michal Hocko
[not found] ` <1351931915-1701-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-09 17:15 ` [PATCHSET cgroup/for-3.8] " Tejun Heo
9 siblings, 2 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-03 8:38 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA
Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved. They all operated
separately.
This patch implements proper hierarchy support. If a cgroup is
frozen, all its descendants are frozen. A cgroup is thawed iff it and
all its ancestors are THAWED. freezer.self_freezing shows the current
freezing state for the cgroup itself. freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.
freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state. update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.
Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations. Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.
As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.
Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too. This behavior change is
intended and has been warned via .broken_hierarchy.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup_freezer.c | 161 ++++++++++++++++++++++++++++++++++++------------
1 file changed, 123 insertions(+), 38 deletions(-)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 4f12d31..3262537 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -22,6 +22,13 @@
#include <linux/freezer.h>
#include <linux/seq_file.h>
+/*
+ * A cgroup is freezing if any FREEZING flags are set. FREEZING_SELF is
+ * set if "FROZEN" is written to freezer.state cgroupfs file, and cleared
+ * for "THAWED". FREEZING_PARENT is set if the parent freezer is FREEZING
+ * for whatever reason. IOW, a cgroup has FREEZING_PARENT set if one of
+ * its ancestors has FREEZING_SELF set.
+ */
enum freezer_state_flags {
CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */
CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
@@ -50,6 +57,15 @@ static inline struct freezer *task_freezer(struct task_struct *task)
struct freezer, css);
}
+static struct freezer *parent_freezer(struct freezer *freezer)
+{
+ struct cgroup *pcg = freezer->css.cgroup->parent;
+
+ if (pcg)
+ return cgroup_freezer(pcg);
+ return NULL;
+}
+
bool cgroup_freezing(struct task_struct *task)
{
bool ret;
@@ -74,17 +90,6 @@ static const char *freezer_state_strs(unsigned int state)
return "THAWED";
};
-/*
- * State diagram
- * Transitions are caused by userspace writes to the freezer.state file.
- * The values in parenthesis are state labels. The rest are edge labels.
- *
- * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
- * ^ ^ | |
- * | \_______THAWED_______/ |
- * \__________________________THAWED____________/
- */
-
struct cgroup_subsys freezer_subsys;
static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
@@ -103,15 +108,34 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
* freezer_post_create - commit creation of a freezer cgroup
* @cgroup: cgroup being created
*
- * We're committing to creation of @cgroup. Mark it online.
+ * We're committing to creation of @cgroup. Mark it online and inherit
+ * parent's freezing state while holding both parent's and our
+ * freezer->lock.
*/
static void freezer_post_create(struct cgroup *cgroup)
{
struct freezer *freezer = cgroup_freezer(cgroup);
+ struct freezer *parent = parent_freezer(freezer);
+
+ /*
+ * The following double locking and freezing state inheritance
+ * guarantee that @cgroup can never escape ancestors' freezing
+ * states. See cgroup_for_each_descendant_pre() for details.
+ */
+ if (parent)
+ spin_lock_irq(&parent->lock);
+ spin_lock_nested(&freezer->lock, SINGLE_DEPTH_NESTING);
- spin_lock_irq(&freezer->lock);
freezer->state |= CGROUP_FREEZER_ONLINE;
- spin_unlock_irq(&freezer->lock);
+
+ if (parent && (parent->state & CGROUP_FREEZING)) {
+ freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
+ atomic_inc(&system_freezing_cnt);
+ }
+
+ spin_unlock(&freezer->lock);
+ if (parent)
+ spin_unlock_irq(&parent->lock);
}
/**
@@ -153,6 +177,7 @@ static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
{
struct freezer *freezer = cgroup_freezer(new_cgrp);
struct task_struct *task;
+ bool clear_frozen = false;
spin_lock_irq(&freezer->lock);
@@ -172,10 +197,25 @@ static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
} else {
freeze_task(task);
freezer->state &= ~CGROUP_FROZEN;
+ clear_frozen = true;
}
}
spin_unlock_irq(&freezer->lock);
+
+ /*
+ * Propagate FROZEN clearing upwards. We may race with
+ * update_if_frozen(), but as long as both work bottom-up, either
+ * update_if_frozen() sees child's FROZEN cleared or we clear the
+ * parent's FROZEN later. No parent w/ !FROZEN children can be
+ * left FROZEN.
+ */
+ while (clear_frozen && (freezer = parent_freezer(freezer))) {
+ spin_lock_irq(&freezer->lock);
+ freezer->state &= ~CGROUP_FROZEN;
+ clear_frozen = freezer->state & CGROUP_FREEZING;
+ spin_unlock_irq(&freezer->lock);
+ }
}
static void freezer_fork(struct task_struct *task)
@@ -200,24 +240,47 @@ out:
rcu_read_unlock();
}
-/*
- * We change from FREEZING to FROZEN lazily if the cgroup was only
- * partially frozen when we exitted write. Caller must hold freezer->lock.
+/**
+ * update_if_frozen - update whether a cgroup finished freezing
+ * @cgroup: cgroup of interest
+ *
+ * Once FREEZING is initiated, transition to FROZEN is lazily updated by
+ * calling this function. If the current state is FREEZING but not FROZEN,
+ * this function checks whether all tasks of this cgroup and the descendant
+ * cgroups finished freezing and, if so, sets FROZEN.
+ *
+ * The caller is responsible for grabbing RCU read lock and calling
+ * update_if_frozen() on all descendants prior to invoking this function.
*
* Task states and freezer state might disagree while tasks are being
* migrated into or out of @cgroup, so we can't verify task states against
* @freezer state here. See freezer_attach() for details.
*/
-static void update_if_frozen(struct freezer *freezer)
+static void update_if_frozen(struct cgroup *cgroup)
{
- struct cgroup *cgroup = freezer->css.cgroup;
+ struct freezer *freezer = cgroup_freezer(cgroup);
+ struct cgroup *pos;
struct cgroup_iter it;
struct task_struct *task;
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ spin_lock_irq(&freezer->lock);
+
if (!(freezer->state & CGROUP_FREEZING) ||
(freezer->state & CGROUP_FROZEN))
- return;
+ goto out_unlock;
+
+ /* are all (live) children frozen? */
+ cgroup_for_each_child(pos, cgroup) {
+ struct freezer *child = cgroup_freezer(pos);
+ if ((child->state & CGROUP_FREEZER_ONLINE) &&
+ !(child->state & CGROUP_FROZEN))
+ goto out_unlock;
+ }
+
+ /* are all tasks frozen? */
cgroup_iter_start(cgroup, &it);
while ((task = cgroup_iter_next(cgroup, &it))) {
@@ -229,27 +292,32 @@ static void update_if_frozen(struct freezer *freezer)
* the usual frozen condition.
*/
if (!frozen(task) && !freezer_should_skip(task))
- goto notyet;
+ goto out_iter_end;
}
}
freezer->state |= CGROUP_FROZEN;
-notyet:
+out_iter_end:
cgroup_iter_end(cgroup, &it);
+out_unlock:
+ spin_unlock_irq(&freezer->lock);
}
static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
struct seq_file *m)
{
- struct freezer *freezer = cgroup_freezer(cgroup);
- unsigned int state;
+ struct cgroup *pos;
- spin_lock_irq(&freezer->lock);
- update_if_frozen(freezer);
- state = freezer->state;
- spin_unlock_irq(&freezer->lock);
+ rcu_read_lock();
- seq_puts(m, freezer_state_strs(state));
+ /* update states bottom-up */
+ cgroup_for_each_descendant_post(pos, cgroup)
+ update_if_frozen(pos);
+ update_if_frozen(cgroup);
+
+ rcu_read_unlock();
+
+ seq_puts(m, freezer_state_strs(cgroup_freezer(cgroup)->state));
seq_putc(m, '\n');
return 0;
}
@@ -320,14 +388,39 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze,
* @freezer: freezer of interest
* @freeze: whether to freeze or thaw
*
- * Freeze or thaw @cgroup according to @freeze.
+ * Freeze or thaw @freezer according to @freeze. The operations are
+ * recursive - all descendants of @freezer will be affected.
*/
static void freezer_change_state(struct freezer *freezer, bool freeze)
{
+ struct cgroup *pos;
+
/* update @freezer */
spin_lock_irq(&freezer->lock);
freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
spin_unlock_irq(&freezer->lock);
+
+ /*
+ * Update all its descendants in pre-order traversal. Each
+ * descendant will try to inherit its parent's FREEZING state as
+ * CGROUP_FREEZING_PARENT.
+ */
+ rcu_read_lock();
+ cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
+ struct freezer *pos_f = cgroup_freezer(pos);
+ struct freezer *parent = parent_freezer(freezer);
+
+ /*
+ * Our update to @parent->state is already visible which is
+ * all we need. No need to lock @parent. For more info on
+ * synchronization, see freezer_post_create().
+ */
+ spin_lock_irq(&pos_f->lock);
+ freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING,
+ CGROUP_FREEZING_PARENT);
+ spin_unlock_irq(&pos_f->lock);
+ }
+ rcu_read_unlock();
}
static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
@@ -390,12 +483,4 @@ struct cgroup_subsys freezer_subsys = {
.attach = freezer_attach,
.fork = freezer_fork,
.base_cftypes = files,
-
- /*
- * freezer subsys doesn't handle hierarchy at all. Frozen state
- * should be inherited through the hierarchy - if a parent is
- * frozen, all its children should be frozen. Fix it and remove
- * the following.
- */
- .broken_hierarchy = true,
};
--
1.7.11.7
^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 9/9] cgroup_freezer: implement proper hierarchy support
2012-11-03 8:38 ` [PATCH 9/9] cgroup_freezer: implement proper hierarchy support Tejun Heo
@ 2012-11-07 11:00 ` Michal Hocko
2012-11-07 16:31 ` Tejun Heo
[not found] ` <1351931915-1701-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
1 sibling, 1 reply; 74+ messages in thread
From: Michal Hocko @ 2012-11-07 11:00 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
On Sat 03-11-12 01:38:35, Tejun Heo wrote:
[...]
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 4f12d31..3262537 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
[...]
> @@ -320,14 +388,39 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze,
> * @freezer: freezer of interest
> * @freeze: whether to freeze or thaw
> *
> - * Freeze or thaw @cgroup according to @freeze.
> + * Freeze or thaw @freezer according to @freeze. The operations are
> + * recursive - all descendants of @freezer will be affected.
> */
> static void freezer_change_state(struct freezer *freezer, bool freeze)
> {
> + struct cgroup *pos;
> +
> /* update @freezer */
> spin_lock_irq(&freezer->lock);
> freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
> spin_unlock_irq(&freezer->lock);
> +
> + /*
> + * Update all its descendants in pre-order traversal. Each
> + * descendant will try to inherit its parent's FREEZING state as
> + * CGROUP_FREEZING_PARENT.
> + */
> + rcu_read_lock();
> + cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
> + struct freezer *pos_f = cgroup_freezer(pos);
> + struct freezer *parent = parent_freezer(freezer);
This doesn't seem right. Why are we interested in freezer->parent here
at all? We should either care about freezer->state & CGROUP_FREEZING or
parent = parent_freezer(pos_f), right?
> +
> + /*
> + * Our update to @parent->state is already visible which is
> + * all we need. No need to lock @parent. For more info on
> + * synchronization, see freezer_post_create().
> + */
> + spin_lock_irq(&pos_f->lock);
> + freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING,
> + CGROUP_FREEZING_PARENT);
> + spin_unlock_irq(&pos_f->lock);
> + }
> + rcu_read_unlock();
> }
>
> static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 9/9] cgroup_freezer: implement proper hierarchy support
2012-11-07 11:00 ` Michal Hocko
@ 2012-11-07 16:31 ` Tejun Heo
0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-07 16:31 UTC (permalink / raw)
To: Michal Hocko
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
Hello, Michal.
On Wed, Nov 07, 2012 at 12:00:57PM +0100, Michal Hocko wrote:
> > + * Update all its descendants in pre-order traversal. Each
> > + * descendant will try to inherit its parent's FREEZING state as
> > + * CGROUP_FREEZING_PARENT.
> > + */
> > + rcu_read_lock();
> > + cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
> > + struct freezer *pos_f = cgroup_freezer(pos);
> > + struct freezer *parent = parent_freezer(freezer);
>
> This doesn't seem right. Why are we interested in freezer->parent here
> at all? We should either care about freezer->state & CGROUP_FREEZING or
> parent = parent_freezer(pos_f), right?
Yeap, that should have been parent_freezer(pos_f). Argh...
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 74+ messages in thread
[parent not found: <1351931915-1701-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support
[not found] ` <1351931915-1701-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-11-07 16:39 ` Tejun Heo
[not found] ` <20121107163919.GC2660-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2012-11-08 17:57 ` [PATCH 9/9 v3] " Tejun Heo
1 sibling, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2012-11-07 16:39 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA
Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved. They all operated
separately.
This patch implements proper hierarchy support. If a cgroup is
frozen, all its descendants are frozen. A cgroup is thawed iff it and
all its ancestors are THAWED. freezer.self_freezing shows the current
freezing state for the cgroup itself. freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.
freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state. update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.
Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations. Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.
As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.
Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too. This behavior change is
intended and has been warned via .broken_hierarchy.
v2: Michal spotted a bug in freezer_change_state() - descendants were
inheriting from the wrong ancestor. Fixed.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup_freezer.c | 161 ++++++++++++++++++++++++++++++++++++------------
1 file changed, 123 insertions(+), 38 deletions(-)
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -22,6 +22,13 @@
#include <linux/freezer.h>
#include <linux/seq_file.h>
+/*
+ * A cgroup is freezing if any FREEZING flags are set. FREEZING_SELF is
+ * set if "FROZEN" is written to freezer.state cgroupfs file, and cleared
+ * for "THAWED". FREEZING_PARENT is set if the parent freezer is FREEZING
+ * for whatever reason. IOW, a cgroup has FREEZING_PARENT set if one of
+ * its ancestors has FREEZING_SELF set.
+ */
enum freezer_state_flags {
CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */
CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
@@ -50,6 +57,15 @@ static inline struct freezer *task_freez
struct freezer, css);
}
+static struct freezer *parent_freezer(struct freezer *freezer)
+{
+ struct cgroup *pcg = freezer->css.cgroup->parent;
+
+ if (pcg)
+ return cgroup_freezer(pcg);
+ return NULL;
+}
+
bool cgroup_freezing(struct task_struct *task)
{
bool ret;
@@ -74,17 +90,6 @@ static const char *freezer_state_strs(un
return "THAWED";
};
-/*
- * State diagram
- * Transitions are caused by userspace writes to the freezer.state file.
- * The values in parenthesis are state labels. The rest are edge labels.
- *
- * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
- * ^ ^ | |
- * | \_______THAWED_______/ |
- * \__________________________THAWED____________/
- */
-
struct cgroup_subsys freezer_subsys;
static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
@@ -103,15 +108,34 @@ static struct cgroup_subsys_state *freez
* freezer_post_create - commit creation of a freezer cgroup
* @cgroup: cgroup being created
*
- * We're committing to creation of @cgroup. Mark it online.
+ * We're committing to creation of @cgroup. Mark it online and inherit
+ * parent's freezing state while holding both parent's and our
+ * freezer->lock.
*/
static void freezer_post_create(struct cgroup *cgroup)
{
struct freezer *freezer = cgroup_freezer(cgroup);
+ struct freezer *parent = parent_freezer(freezer);
+
+ /*
+ * The following double locking and freezing state inheritance
+ * guarantee that @cgroup can never escape ancestors' freezing
+ * states. See cgroup_for_each_descendant_pre() for details.
+ */
+ if (parent)
+ spin_lock_irq(&parent->lock);
+ spin_lock_nested(&freezer->lock, SINGLE_DEPTH_NESTING);
- spin_lock_irq(&freezer->lock);
freezer->state |= CGROUP_FREEZER_ONLINE;
- spin_unlock_irq(&freezer->lock);
+
+ if (parent && (parent->state & CGROUP_FREEZING)) {
+ freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
+ atomic_inc(&system_freezing_cnt);
+ }
+
+ spin_unlock(&freezer->lock);
+ if (parent)
+ spin_unlock_irq(&parent->lock);
}
/**
@@ -153,6 +177,7 @@ static void freezer_attach(struct cgroup
{
struct freezer *freezer = cgroup_freezer(new_cgrp);
struct task_struct *task;
+ bool clear_frozen = false;
spin_lock_irq(&freezer->lock);
@@ -172,10 +197,25 @@ static void freezer_attach(struct cgroup
} else {
freeze_task(task);
freezer->state &= ~CGROUP_FROZEN;
+ clear_frozen = true;
}
}
spin_unlock_irq(&freezer->lock);
+
+ /*
+ * Propagate FROZEN clearing upwards. We may race with
+ * update_if_frozen(), but as long as both work bottom-up, either
+ * update_if_frozen() sees child's FROZEN cleared or we clear the
+ * parent's FROZEN later. No parent w/ !FROZEN children can be
+ * left FROZEN.
+ */
+ while (clear_frozen && (freezer = parent_freezer(freezer))) {
+ spin_lock_irq(&freezer->lock);
+ freezer->state &= ~CGROUP_FROZEN;
+ clear_frozen = freezer->state & CGROUP_FREEZING;
+ spin_unlock_irq(&freezer->lock);
+ }
}
static void freezer_fork(struct task_struct *task)
@@ -200,24 +240,47 @@ out:
rcu_read_unlock();
}
-/*
- * We change from FREEZING to FROZEN lazily if the cgroup was only
- * partially frozen when we exitted write. Caller must hold freezer->lock.
+/**
+ * update_if_frozen - update whether a cgroup finished freezing
+ * @cgroup: cgroup of interest
+ *
+ * Once FREEZING is initiated, transition to FROZEN is lazily updated by
+ * calling this function. If the current state is FREEZING but not FROZEN,
+ * this function checks whether all tasks of this cgroup and the descendant
+ * cgroups finished freezing and, if so, sets FROZEN.
+ *
+ * The caller is responsible for grabbing RCU read lock and calling
+ * update_if_frozen() on all descendants prior to invoking this function.
*
* Task states and freezer state might disagree while tasks are being
* migrated into or out of @cgroup, so we can't verify task states against
* @freezer state here. See freezer_attach() for details.
*/
-static void update_if_frozen(struct freezer *freezer)
+static void update_if_frozen(struct cgroup *cgroup)
{
- struct cgroup *cgroup = freezer->css.cgroup;
+ struct freezer *freezer = cgroup_freezer(cgroup);
+ struct cgroup *pos;
struct cgroup_iter it;
struct task_struct *task;
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ spin_lock_irq(&freezer->lock);
+
if (!(freezer->state & CGROUP_FREEZING) ||
(freezer->state & CGROUP_FROZEN))
- return;
+ goto out_unlock;
+ /* are all (live) children frozen? */
+ cgroup_for_each_child(pos, cgroup) {
+ struct freezer *child = cgroup_freezer(pos);
+
+ if ((child->state & CGROUP_FREEZER_ONLINE) &&
+ !(child->state & CGROUP_FROZEN))
+ goto out_unlock;
+ }
+
+ /* are all tasks frozen? */
cgroup_iter_start(cgroup, &it);
while ((task = cgroup_iter_next(cgroup, &it))) {
@@ -229,27 +292,32 @@ static void update_if_frozen(struct free
* the usual frozen condition.
*/
if (!frozen(task) && !freezer_should_skip(task))
- goto notyet;
+ goto out_iter_end;
}
}
freezer->state |= CGROUP_FROZEN;
-notyet:
+out_iter_end:
cgroup_iter_end(cgroup, &it);
+out_unlock:
+ spin_unlock_irq(&freezer->lock);
}
static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
struct seq_file *m)
{
- struct freezer *freezer = cgroup_freezer(cgroup);
- unsigned int state;
+ struct cgroup *pos;
- spin_lock_irq(&freezer->lock);
- update_if_frozen(freezer);
- state = freezer->state;
- spin_unlock_irq(&freezer->lock);
+ rcu_read_lock();
- seq_puts(m, freezer_state_strs(state));
+ /* update states bottom-up */
+ cgroup_for_each_descendant_post(pos, cgroup)
+ update_if_frozen(pos);
+ update_if_frozen(cgroup);
+
+ rcu_read_unlock();
+
+ seq_puts(m, freezer_state_strs(cgroup_freezer(cgroup)->state));
seq_putc(m, '\n');
return 0;
}
@@ -320,14 +388,39 @@ static void freezer_apply_state(struct f
* @freezer: freezer of interest
* @freeze: whether to freeze or thaw
*
- * Freeze or thaw @cgroup according to @freeze.
+ * Freeze or thaw @freezer according to @freeze. The operations are
+ * recursive - all descendants of @freezer will be affected.
*/
static void freezer_change_state(struct freezer *freezer, bool freeze)
{
+ struct cgroup *pos;
+
/* update @freezer */
spin_lock_irq(&freezer->lock);
freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
spin_unlock_irq(&freezer->lock);
+
+ /*
+ * Update all its descendants in pre-order traversal. Each
+ * descendant will try to inherit its parent's FREEZING state as
+ * CGROUP_FREEZING_PARENT.
+ */
+ rcu_read_lock();
+ cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
+ struct freezer *pos_f = cgroup_freezer(pos);
+ struct freezer *parent = parent_freezer(pos_f);
+
+ /*
+ * Our update to @parent->state is already visible which is
+ * all we need. No need to lock @parent. For more info on
+ * synchronization, see freezer_post_create().
+ */
+ spin_lock_irq(&pos_f->lock);
+ freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING,
+ CGROUP_FREEZING_PARENT);
+ spin_unlock_irq(&pos_f->lock);
+ }
+ rcu_read_unlock();
}
static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
@@ -390,12 +483,4 @@ struct cgroup_subsys freezer_subsys = {
.attach = freezer_attach,
.fork = freezer_fork,
.base_cftypes = files,
-
- /*
- * freezer subsys doesn't handle hierarchy at all. Frozen state
- * should be inherited through the hierarchy - if a parent is
- * frozen, all its children should be frozen. Fix it and remove
- * the following.
- */
- .broken_hierarchy = true,
};
^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 9/9 v3] cgroup_freezer: implement proper hierarchy support
[not found] ` <1351931915-1701-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-07 16:39 ` [PATCH 9/9 v2] " Tejun Heo
@ 2012-11-08 17:57 ` Tejun Heo
[not found] ` <20121108175750.GK12973-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
1 sibling, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2012-11-08 17:57 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w
Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved. They all operated
separately.
This patch implements proper hierarchy support. If a cgroup is
frozen, all its descendants are frozen. A cgroup is thawed iff it and
all its ancestors are THAWED. freezer.self_freezing shows the current
freezing state for the cgroup itself. freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.
freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state. update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.
Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations. Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.
As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.
Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too. This behavior change is
intended and has been warned via .broken_hierarchy.
v2: Michal spotted a bug in freezer_change_state() - descendants were
inheriting from the wrong ancestor. Fixed.
v3: Documentation/cgroups/freezer-subsystem.txt updated.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reviewed-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
Documentation/cgroups/freezer-subsystem.txt | 63 +++++++---
kernel/cgroup_freezer.c | 161 +++++++++++++++++++++-------
2 files changed, 165 insertions(+), 59 deletions(-)
--- a/Documentation/cgroups/freezer-subsystem.txt
+++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -49,13 +49,49 @@ prevent the freeze/unfreeze cycle from b
being frozen. This allows the bash example above and gdb to run as
expected.
-The freezer subsystem in the container filesystem defines a file named
-freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
-cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup.
-Reading will return the current state.
+The cgroup freezer is hierarchical. Freezing a cgroup freezes all
+tasks beloning to the cgroup and all its descendant cgroups. Each
+cgroup has its own state (self-state) and the state inherited from the
+parent (parent-state). Iff both states are THAWED, the cgroup is
+THAWED.
-Note freezer.state doesn't exist in root cgroup, which means root cgroup
-is non-freezable.
+The following cgroupfs files are created by cgroup freezer.
+
+* freezer.state: Read-write.
+
+ When read, returns the effective state of the cgroup - "THAWED",
+ "FREEZING" or "FROZEN". This is the combined self and parent-states.
+ If any is freezing, the cgroup is freezing (FREEZING or FROZEN).
+
+ FREEZING cgroup transitions into FROZEN state when all tasks
+ belonging to the cgroup and its descendants become frozen. Note that
+ a cgroup reverts to FREEZING from FROZEN after a new task is added
+ to the cgroup or one of its descendant cgroups until the new task is
+ frozen.
+
+ When written, sets the self-state of the cgroup. Two values are
+ allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup,
+ if not already freezing, enters FREEZING state along with all its
+ descendant cgroups.
+
+ If THAWED is written, the self-state of the cgroup is changed to
+ THAWED. Note that the effective state may not change to THAWED if
+ the parent-state is still freezing. If a cgroup's effective state
+ becomes THAWED, all its descendants which are freezing because of
+ the cgroup also leave the freezing state.
+
+* freezer.self_freezing: Read only.
+
+ Shows the self-state. 0 if the self-state is THAWED; otherwise, 1.
+ This value is 1 iff the last write to freezer.state was "FROZEN".
+
+* freezer.parent_freezing: Read only.
+
+ Shows the parent-state. 0 if none of the cgroup's ancestors is
+ frozen; otherwise, 1.
+
+The root cgroup is non-freezable and the above interface files don't
+exist.
* Examples of usage :
@@ -85,18 +121,3 @@ to unfreeze all tasks in the container :
This is the basic mechanism which should do the right thing for user space task
in a simple scenario.
-
-It's important to note that freezing can be incomplete. In that case we return
-EBUSY. This means that some tasks in the cgroup are busy doing something that
-prevents us from completely freezing the cgroup at this time. After EBUSY,
-the cgroup will remain partially frozen -- reflected by freezer.state reporting
-"FREEZING" when read. The state will remain "FREEZING" until one of these
-things happens:
-
- 1) Userspace cancels the freezing operation by writing "THAWED" to
- the freezer.state file
- 2) Userspace retries the freezing operation by writing "FROZEN" to
- the freezer.state file (writing "FREEZING" is not legal
- and returns EINVAL)
- 3) The tasks that blocked the cgroup from entering the "FROZEN"
- state disappear from the cgroup's set of tasks.
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -22,6 +22,13 @@
#include <linux/freezer.h>
#include <linux/seq_file.h>
+/*
+ * A cgroup is freezing if any FREEZING flags are set. FREEZING_SELF is
+ * set if "FROZEN" is written to freezer.state cgroupfs file, and cleared
+ * for "THAWED". FREEZING_PARENT is set if the parent freezer is FREEZING
+ * for whatever reason. IOW, a cgroup has FREEZING_PARENT set if one of
+ * its ancestors has FREEZING_SELF set.
+ */
enum freezer_state_flags {
CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */
CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
@@ -50,6 +57,15 @@ static inline struct freezer *task_freez
struct freezer, css);
}
+static struct freezer *parent_freezer(struct freezer *freezer)
+{
+ struct cgroup *pcg = freezer->css.cgroup->parent;
+
+ if (pcg)
+ return cgroup_freezer(pcg);
+ return NULL;
+}
+
bool cgroup_freezing(struct task_struct *task)
{
bool ret;
@@ -74,17 +90,6 @@ static const char *freezer_state_strs(un
return "THAWED";
};
-/*
- * State diagram
- * Transitions are caused by userspace writes to the freezer.state file.
- * The values in parenthesis are state labels. The rest are edge labels.
- *
- * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
- * ^ ^ | |
- * | \_______THAWED_______/ |
- * \__________________________THAWED____________/
- */
-
struct cgroup_subsys freezer_subsys;
static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
@@ -103,15 +108,34 @@ static struct cgroup_subsys_state *freez
* freezer_post_create - commit creation of a freezer cgroup
* @cgroup: cgroup being created
*
- * We're committing to creation of @cgroup. Mark it online.
+ * We're committing to creation of @cgroup. Mark it online and inherit
+ * parent's freezing state while holding both parent's and our
+ * freezer->lock.
*/
static void freezer_post_create(struct cgroup *cgroup)
{
struct freezer *freezer = cgroup_freezer(cgroup);
+ struct freezer *parent = parent_freezer(freezer);
+
+ /*
+ * The following double locking and freezing state inheritance
+ * guarantee that @cgroup can never escape ancestors' freezing
+ * states. See cgroup_for_each_descendant_pre() for details.
+ */
+ if (parent)
+ spin_lock_irq(&parent->lock);
+ spin_lock_nested(&freezer->lock, SINGLE_DEPTH_NESTING);
- spin_lock_irq(&freezer->lock);
freezer->state |= CGROUP_FREEZER_ONLINE;
- spin_unlock_irq(&freezer->lock);
+
+ if (parent && (parent->state & CGROUP_FREEZING)) {
+ freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
+ atomic_inc(&system_freezing_cnt);
+ }
+
+ spin_unlock(&freezer->lock);
+ if (parent)
+ spin_unlock_irq(&parent->lock);
}
/**
@@ -153,6 +177,7 @@ static void freezer_attach(struct cgroup
{
struct freezer *freezer = cgroup_freezer(new_cgrp);
struct task_struct *task;
+ bool clear_frozen = false;
spin_lock_irq(&freezer->lock);
@@ -172,10 +197,25 @@ static void freezer_attach(struct cgroup
} else {
freeze_task(task);
freezer->state &= ~CGROUP_FROZEN;
+ clear_frozen = true;
}
}
spin_unlock_irq(&freezer->lock);
+
+ /*
+ * Propagate FROZEN clearing upwards. We may race with
+ * update_if_frozen(), but as long as both work bottom-up, either
+ * update_if_frozen() sees child's FROZEN cleared or we clear the
+ * parent's FROZEN later. No parent w/ !FROZEN children can be
+ * left FROZEN.
+ */
+ while (clear_frozen && (freezer = parent_freezer(freezer))) {
+ spin_lock_irq(&freezer->lock);
+ freezer->state &= ~CGROUP_FROZEN;
+ clear_frozen = freezer->state & CGROUP_FREEZING;
+ spin_unlock_irq(&freezer->lock);
+ }
}
static void freezer_fork(struct task_struct *task)
@@ -200,24 +240,47 @@ out:
rcu_read_unlock();
}
-/*
- * We change from FREEZING to FROZEN lazily if the cgroup was only
- * partially frozen when we exitted write. Caller must hold freezer->lock.
+/**
+ * update_if_frozen - update whether a cgroup finished freezing
+ * @cgroup: cgroup of interest
+ *
+ * Once FREEZING is initiated, transition to FROZEN is lazily updated by
+ * calling this function. If the current state is FREEZING but not FROZEN,
+ * this function checks whether all tasks of this cgroup and the descendant
+ * cgroups finished freezing and, if so, sets FROZEN.
+ *
+ * The caller is responsible for grabbing RCU read lock and calling
+ * update_if_frozen() on all descendants prior to invoking this function.
*
* Task states and freezer state might disagree while tasks are being
* migrated into or out of @cgroup, so we can't verify task states against
* @freezer state here. See freezer_attach() for details.
*/
-static void update_if_frozen(struct freezer *freezer)
+static void update_if_frozen(struct cgroup *cgroup)
{
- struct cgroup *cgroup = freezer->css.cgroup;
+ struct freezer *freezer = cgroup_freezer(cgroup);
+ struct cgroup *pos;
struct cgroup_iter it;
struct task_struct *task;
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ spin_lock_irq(&freezer->lock);
+
if (!(freezer->state & CGROUP_FREEZING) ||
(freezer->state & CGROUP_FROZEN))
- return;
+ goto out_unlock;
+ /* are all (live) children frozen? */
+ cgroup_for_each_child(pos, cgroup) {
+ struct freezer *child = cgroup_freezer(pos);
+
+ if ((child->state & CGROUP_FREEZER_ONLINE) &&
+ !(child->state & CGROUP_FROZEN))
+ goto out_unlock;
+ }
+
+ /* are all tasks frozen? */
cgroup_iter_start(cgroup, &it);
while ((task = cgroup_iter_next(cgroup, &it))) {
@@ -229,27 +292,32 @@ static void update_if_frozen(struct free
* the usual frozen condition.
*/
if (!frozen(task) && !freezer_should_skip(task))
- goto notyet;
+ goto out_iter_end;
}
}
freezer->state |= CGROUP_FROZEN;
-notyet:
+out_iter_end:
cgroup_iter_end(cgroup, &it);
+out_unlock:
+ spin_unlock_irq(&freezer->lock);
}
static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
struct seq_file *m)
{
- struct freezer *freezer = cgroup_freezer(cgroup);
- unsigned int state;
+ struct cgroup *pos;
- spin_lock_irq(&freezer->lock);
- update_if_frozen(freezer);
- state = freezer->state;
- spin_unlock_irq(&freezer->lock);
+ rcu_read_lock();
- seq_puts(m, freezer_state_strs(state));
+ /* update states bottom-up */
+ cgroup_for_each_descendant_post(pos, cgroup)
+ update_if_frozen(pos);
+ update_if_frozen(cgroup);
+
+ rcu_read_unlock();
+
+ seq_puts(m, freezer_state_strs(cgroup_freezer(cgroup)->state));
seq_putc(m, '\n');
return 0;
}
@@ -320,14 +388,39 @@ static void freezer_apply_state(struct f
* @freezer: freezer of interest
* @freeze: whether to freeze or thaw
*
- * Freeze or thaw @cgroup according to @freeze.
+ * Freeze or thaw @freezer according to @freeze. The operations are
+ * recursive - all descendants of @freezer will be affected.
*/
static void freezer_change_state(struct freezer *freezer, bool freeze)
{
+ struct cgroup *pos;
+
/* update @freezer */
spin_lock_irq(&freezer->lock);
freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
spin_unlock_irq(&freezer->lock);
+
+ /*
+ * Update all its descendants in pre-order traversal. Each
+ * descendant will try to inherit its parent's FREEZING state as
+ * CGROUP_FREEZING_PARENT.
+ */
+ rcu_read_lock();
+ cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
+ struct freezer *pos_f = cgroup_freezer(pos);
+ struct freezer *parent = parent_freezer(pos_f);
+
+ /*
+ * Our update to @parent->state is already visible which is
+ * all we need. No need to lock @parent. For more info on
+ * synchronization, see freezer_post_create().
+ */
+ spin_lock_irq(&pos_f->lock);
+ freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING,
+ CGROUP_FREEZING_PARENT);
+ spin_unlock_irq(&pos_f->lock);
+ }
+ rcu_read_unlock();
}
static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
@@ -390,12 +483,4 @@ struct cgroup_subsys freezer_subsys = {
.attach = freezer_attach,
.fork = freezer_fork,
.base_cftypes = files,
-
- /*
- * freezer subsys doesn't handle hierarchy at all. Frozen state
- * should be inherited through the hierarchy - if a parent is
- * frozen, all its children should be frozen. Fix it and remove
- * the following.
- */
- .broken_hierarchy = true,
};
^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCHSET cgroup/for-3.8] cgroup_freezer: implement proper hierarchy support
[not found] ` <1351931915-1701-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (8 preceding siblings ...)
2012-11-03 8:38 ` [PATCH 9/9] cgroup_freezer: implement proper hierarchy support Tejun Heo
@ 2012-11-09 17:15 ` Tejun Heo
9 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2012-11-09 17:15 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA
On Sat, Nov 03, 2012 at 01:38:26AM -0700, Tejun Heo wrote:
> Hello,
>
> This patchset implement proper hierarchy support for cgroup_freezer as
> discussed in "[RFC] cgroup TODOs"[1].
Applied to cgroup/for-3.8. Rafael, I applied the cgroup_freezer
changes there too as there already are and will be more dependencies
between cgroup_freezer and cgroup changes, and the cgroup_freezer
changes don't really affect the rest of the freezer. If you'd like
them to be routed differently, please let me know.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 74+ messages in thread