* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-14 21:44 ` Tejun Heo
@ 2014-04-15 0:57 ` Li Zefan
-1 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-15 0:57 UTC (permalink / raw)
To: Tejun Heo
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, kay-tD+1rO4QERM,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, eparis-FjpueFixGhCM4zKIHC2jIg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
On 2014/4/15 5:44, Tejun Heo wrote:
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up. cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.
>
> * It delivers events by forking and execing a userland binary
> specified as the release_agent. This is a long deprecated method of
> notification delivery. It's extremely heavy, slow and cumbersome to
> integrate with larger infrastructure.
>
> * There is single monitoring point at the root. There's no way to
> delegate management of subtree.
>
> * The event isn't recursive. It triggers when a cgroup doesn't have
> any tasks or child cgroups. Events for internal nodes trigger only
> after all children are removed. This again makes it impossible to
> delegate management of subtree.
>
> * Events are filtered from the kernel side. "notify_on_release" file
> is used to subscribe to or suppress release event. This is
> unnecessarily complicated and probably done this way because event
> delivery itself was expensive.
>
> This patch implements interface file "cgroup.subtree_populated" which
> can be used to monitor whether the cgroup's subhierarchy has tasks in
> it or not. Its value is 0 if there is no task in the cgroup and its
> descendants; otherwise, 1,
Is cgroup.tree_populated a better name?
cgroup.subtree_control controls child cgroups only, but .subtree_populated
shows 1 if there're tasks in the cgroup or its children, so the two
are a bit inconsistent to me.
> and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
>
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
>
> v2: Patch description updated as per Serge.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Acked-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> Cc: Lennart Poettering <lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-15 0:57 ` Li Zefan
0 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-15 0:57 UTC (permalink / raw)
To: Tejun Heo
Cc: containers, cgroups, linux-kernel, john, rlove, eparis, gregkh,
serge.hallyn, lennart, kay
On 2014/4/15 5:44, Tejun Heo wrote:
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up. cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.
>
> * It delivers events by forking and execing a userland binary
> specified as the release_agent. This is a long deprecated method of
> notification delivery. It's extremely heavy, slow and cumbersome to
> integrate with larger infrastructure.
>
> * There is single monitoring point at the root. There's no way to
> delegate management of subtree.
>
> * The event isn't recursive. It triggers when a cgroup doesn't have
> any tasks or child cgroups. Events for internal nodes trigger only
> after all children are removed. This again makes it impossible to
> delegate management of subtree.
>
> * Events are filtered from the kernel side. "notify_on_release" file
> is used to subscribe to or suppress release event. This is
> unnecessarily complicated and probably done this way because event
> delivery itself was expensive.
>
> This patch implements interface file "cgroup.subtree_populated" which
> can be used to monitor whether the cgroup's subhierarchy has tasks in
> it or not. Its value is 0 if there is no task in the cgroup and its
> descendants; otherwise, 1,
Is cgroup.tree_populated a better name?
cgroup.subtree_control controls child cgroups only, but .subtree_populated
shows 1 if there're tasks in the cgroup or its children, so the two
are a bit inconsistent to me.
> and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
>
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
>
> v2: Patch description updated as per Serge.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
> Cc: Lennart Poettering <lennart@poettering.net>
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <534C83F1.9020106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-15 0:57 ` Li Zefan
@ 2014-04-15 14:54 ` Tejun Heo
-1 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2014-04-15 14:54 UTC (permalink / raw)
To: Li Zefan
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf,
rlove-L7G0xEPcOZbYtjvyW6yDsg, eparis-FjpueFixGhCM4zKIHC2jIg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, kay-tD+1rO4QERM
Hello,
On Tue, Apr 15, 2014 at 08:57:21AM +0800, Li Zefan wrote:
> Is cgroup.tree_populated a better name?
>
> cgroup.subtree_control controls child cgroups only, but .subtree_populated
> shows 1 if there're tasks in the cgroup or its children, so the two
> are a bit inconsistent to me.
Yes, good catch. subtree_control affects subtree proper.
subtree_populated covers self too. The difference is subtle and the
trade off between shared pattern in names and clarifying the subtlety
didn't seem clear-cut to me. Hmmm....
--
tejun
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-15 14:54 ` Tejun Heo
0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2014-04-15 14:54 UTC (permalink / raw)
To: Li Zefan
Cc: containers, cgroups, linux-kernel, john, rlove, eparis, gregkh,
serge.hallyn, lennart, kay
Hello,
On Tue, Apr 15, 2014 at 08:57:21AM +0800, Li Zefan wrote:
> Is cgroup.tree_populated a better name?
>
> cgroup.subtree_control controls child cgroups only, but .subtree_populated
> shows 1 if there're tasks in the cgroup or its children, so the two
> are a bit inconsistent to me.
Yes, good catch. subtree_control affects subtree proper.
subtree_populated covers self too. The difference is subtle and the
trade off between shared pattern in names and clarifying the subtlety
didn't seem clear-cut to me. Hmmm....
--
tejun
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <20140415145450.GL1863-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-15 14:54 ` Tejun Heo
@ 2014-04-15 16:52 ` Tejun Heo
-1 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2014-04-15 16:52 UTC (permalink / raw)
To: Li Zefan
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, kay-tD+1rO4QERM,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, eparis-FjpueFixGhCM4zKIHC2jIg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
On Tue, Apr 15, 2014 at 10:54:50AM -0400, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 15, 2014 at 08:57:21AM +0800, Li Zefan wrote:
> > Is cgroup.tree_populated a better name?
> >
> > cgroup.subtree_control controls child cgroups only, but .subtree_populated
> > shows 1 if there're tasks in the cgroup or its children, so the two
> > are a bit inconsistent to me.
>
> Yes, good catch. subtree_control affects subtree proper.
> subtree_populated covers self too. The difference is subtle and the
> trade off between shared pattern in names and clarifying the subtlety
> didn't seem clear-cut to me. Hmmm....
How about cgroup.populated?
--
tejun
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-15 16:52 ` Tejun Heo
0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2014-04-15 16:52 UTC (permalink / raw)
To: Li Zefan
Cc: containers, cgroups, linux-kernel, john, rlove, eparis, gregkh,
serge.hallyn, lennart, kay
On Tue, Apr 15, 2014 at 10:54:50AM -0400, Tejun Heo wrote:
> Hello,
>
> On Tue, Apr 15, 2014 at 08:57:21AM +0800, Li Zefan wrote:
> > Is cgroup.tree_populated a better name?
> >
> > cgroup.subtree_control controls child cgroups only, but .subtree_populated
> > shows 1 if there're tasks in the cgroup or its children, so the two
> > are a bit inconsistent to me.
>
> Yes, good catch. subtree_control affects subtree proper.
> subtree_populated covers self too. The difference is subtle and the
> trade off between shared pattern in names and clarifying the subtlety
> didn't seem clear-cut to me. Hmmm....
How about cgroup.populated?
--
tejun
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <20140415165221.GD30990-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
[not found] ` <20140415165221.GD30990-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2014-04-16 1:30 ` Li Zefan
2014-04-16 1:30 ` Li Zefan
1 sibling, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 1:30 UTC (permalink / raw)
To: Tejun Heo
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, kay-tD+1rO4QERM,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, eparis-FjpueFixGhCM4zKIHC2jIg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
On 2014/4/16 0:52, Tejun Heo wrote:
> On Tue, Apr 15, 2014 at 10:54:50AM -0400, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Apr 15, 2014 at 08:57:21AM +0800, Li Zefan wrote:
>>> Is cgroup.tree_populated a better name?
>>>
>>> cgroup.subtree_control controls child cgroups only, but .subtree_populated
>>> shows 1 if there're tasks in the cgroup or its children, so the two
>>> are a bit inconsistent to me.
>>
>> Yes, good catch. subtree_control affects subtree proper.
>> subtree_populated covers self too. The difference is subtle and the
>> trade off between shared pattern in names and clarifying the subtlety
>> didn't seem clear-cut to me. Hmmm....
>
> How about cgroup.populated?
>
Yeah, fine for me.
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-15 16:52 ` Tejun Heo
@ 2014-04-16 1:30 ` Li Zefan
-1 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 1:30 UTC (permalink / raw)
To: Tejun Heo
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf,
rlove-L7G0xEPcOZbYtjvyW6yDsg, eparis-FjpueFixGhCM4zKIHC2jIg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, kay-tD+1rO4QERM
On 2014/4/16 0:52, Tejun Heo wrote:
> On Tue, Apr 15, 2014 at 10:54:50AM -0400, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Apr 15, 2014 at 08:57:21AM +0800, Li Zefan wrote:
>>> Is cgroup.tree_populated a better name?
>>>
>>> cgroup.subtree_control controls child cgroups only, but .subtree_populated
>>> shows 1 if there're tasks in the cgroup or its children, so the two
>>> are a bit inconsistent to me.
>>
>> Yes, good catch. subtree_control affects subtree proper.
>> subtree_populated covers self too. The difference is subtle and the
>> trade off between shared pattern in names and clarifying the subtlety
>> didn't seem clear-cut to me. Hmmm....
>
> How about cgroup.populated?
>
Yeah, fine for me.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-16 1:30 ` Li Zefan
0 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 1:30 UTC (permalink / raw)
To: Tejun Heo
Cc: containers, cgroups, linux-kernel, john, rlove, eparis, gregkh,
serge.hallyn, lennart, kay
On 2014/4/16 0:52, Tejun Heo wrote:
> On Tue, Apr 15, 2014 at 10:54:50AM -0400, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Apr 15, 2014 at 08:57:21AM +0800, Li Zefan wrote:
>>> Is cgroup.tree_populated a better name?
>>>
>>> cgroup.subtree_control controls child cgroups only, but .subtree_populated
>>> shows 1 if there're tasks in the cgroup or its children, so the two
>>> are a bit inconsistent to me.
>>
>> Yes, good catch. subtree_control affects subtree proper.
>> subtree_populated covers self too. The difference is subtle and the
>> trade off between shared pattern in names and clarifying the subtlety
>> didn't seem clear-cut to me. Hmmm....
>
> How about cgroup.populated?
>
Yeah, fine for me.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
[not found] ` <534C83F1.9020106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-04-15 14:54 ` Tejun Heo
@ 2014-04-15 14:54 ` Tejun Heo
1 sibling, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2014-04-15 14:54 UTC (permalink / raw)
To: Li Zefan
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, kay-tD+1rO4QERM,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, eparis-FjpueFixGhCM4zKIHC2jIg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
Hello,
On Tue, Apr 15, 2014 at 08:57:21AM +0800, Li Zefan wrote:
> Is cgroup.tree_populated a better name?
>
> cgroup.subtree_control controls child cgroups only, but .subtree_populated
> shows 1 if there're tasks in the cgroup or its children, so the two
> are a bit inconsistent to me.
Yes, good catch. subtree_control affects subtree proper.
subtree_populated covers self too. The difference is subtle and the
trade off between shared pattern in names and clarifying the subtlety
didn't seem clear-cut to me. Hmmm....
--
tejun
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-14 21:44 ` Tejun Heo
@ 2014-04-16 2:48 ` Li Zefan
-1 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 2:48 UTC (permalink / raw)
To: Tejun Heo
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, kay-tD+1rO4QERM,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, eparis-FjpueFixGhCM4zKIHC2jIg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
Hi Tejun,
On 2014/4/15 5:44, Tejun Heo wrote:
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up. cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.
>
> * It delivers events by forking and execing a userland binary
> specified as the release_agent. This is a long deprecated method of
> notification delivery. It's extremely heavy, slow and cumbersome to
> integrate with larger infrastructure.
>
> * There is single monitoring point at the root. There's no way to
> delegate management of subtree.
>
> * The event isn't recursive. It triggers when a cgroup doesn't have
> any tasks or child cgroups. Events for internal nodes trigger only
> after all children are removed. This again makes it impossible to
> delegate management of subtree.
>
> * Events are filtered from the kernel side. "notify_on_release" file
> is used to subscribe to or suppress release event. This is
> unnecessarily complicated and probably done this way because event
> delivery itself was expensive.
>
> This patch implements interface file "cgroup.subtree_populated" which
> can be used to monitor whether the cgroup's subhierarchy has tasks in
> it or not. Its value is 0 if there is no task in the cgroup and its
> descendants; otherwise, 1, and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
>
For the old notification mechanism, the path of the cgroup that becomes
empty will be passed to the user specified release agent. Like this:
# cat /sbin/cpuset_release_agent
#!/bin/sh
rmdir /dev/cpuset/$1
How do we achieve this using inotify?
- monitor all the cgroups, or
- monitor all the leaf cgroups, and travel cgrp->parent to delete all
empty cgroups.
- monitor root cgroup only, and travel the whole hierarchy to find
empy cgroups when it gets an fs event.
Seems none of them is scalible.
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
>
> v2: Patch description updated as per Serge.
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-16 2:48 ` Li Zefan
0 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 2:48 UTC (permalink / raw)
To: Tejun Heo
Cc: containers, cgroups, linux-kernel, john, rlove, eparis, gregkh,
serge.hallyn, lennart, kay
Hi Tejun,
On 2014/4/15 5:44, Tejun Heo wrote:
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up. cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.
>
> * It delivers events by forking and execing a userland binary
> specified as the release_agent. This is a long deprecated method of
> notification delivery. It's extremely heavy, slow and cumbersome to
> integrate with larger infrastructure.
>
> * There is single monitoring point at the root. There's no way to
> delegate management of subtree.
>
> * The event isn't recursive. It triggers when a cgroup doesn't have
> any tasks or child cgroups. Events for internal nodes trigger only
> after all children are removed. This again makes it impossible to
> delegate management of subtree.
>
> * Events are filtered from the kernel side. "notify_on_release" file
> is used to subscribe to or suppress release event. This is
> unnecessarily complicated and probably done this way because event
> delivery itself was expensive.
>
> This patch implements interface file "cgroup.subtree_populated" which
> can be used to monitor whether the cgroup's subhierarchy has tasks in
> it or not. Its value is 0 if there is no task in the cgroup and its
> descendants; otherwise, 1, and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
>
For the old notification mechanism, the path of the cgroup that becomes
empty will be passed to the user specified release agent. Like this:
# cat /sbin/cpuset_release_agent
#!/bin/sh
rmdir /dev/cpuset/$1
How do we achieve this using inotify?
- monitor all the cgroups, or
- monitor all the leaf cgroups, and travel cgrp->parent to delete all
empty cgroups.
- monitor root cgroup only, and travel the whole hierarchy to find
empy cgroups when it gets an fs event.
Seems none of them is scalible.
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
>
> v2: Patch description updated as per Serge.
>
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <534DEF62.4090900-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-16 2:48 ` Li Zefan
@ 2014-04-16 3:33 ` Kay Sievers
-1 siblings, 0 replies; 36+ messages in thread
From: Kay Sievers @ 2014-04-16 3:33 UTC (permalink / raw)
To: Li Zefan
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg, Greg Kroah-Hartman,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, LKML, Lennart Poettering,
eparis-FjpueFixGhCM4zKIHC2jIg, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
> On 2014/4/15 5:44, Tejun Heo wrote:
>> cgroup users often need a way to determine when a cgroup's
>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>> currently provides release_agent for it; unfortunately, this mechanism
>> is riddled with issues.
>>
>> * It delivers events by forking and execing a userland binary
>> specified as the release_agent. This is a long deprecated method of
>> notification delivery. It's extremely heavy, slow and cumbersome to
>> integrate with larger infrastructure.
>>
>> * There is single monitoring point at the root. There's no way to
>> delegate management of subtree.
>>
>> * The event isn't recursive. It triggers when a cgroup doesn't have
>> any tasks or child cgroups. Events for internal nodes trigger only
>> after all children are removed. This again makes it impossible to
>> delegate management of subtree.
>>
>> * Events are filtered from the kernel side. "notify_on_release" file
>> is used to subscribe to or suppress release event. This is
>> unnecessarily complicated and probably done this way because event
>> delivery itself was expensive.
>>
>> This patch implements interface file "cgroup.subtree_populated" which
>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>> it or not. Its value is 0 if there is no task in the cgroup and its
>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>> triggers when the value changes, which can be monitored through poll
>> and [di]notify.
>>
>
> For the old notification mechanism, the path of the cgroup that becomes
> empty will be passed to the user specified release agent. Like this:
>
> # cat /sbin/cpuset_release_agent
> #!/bin/sh
> rmdir /dev/cpuset/$1
>
> How do we achieve this using inotify?
>
> - monitor all the cgroups, or
> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
> empty cgroups.
> - monitor root cgroup only, and travel the whole hierarchy to find
> empy cgroups when it gets an fs event.
>
> Seems none of them is scalible.
The manager would add all cgroups as watches to one inotify file
descriptor, it should not be problem to do that.
Kay
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-16 3:33 ` Kay Sievers
0 siblings, 0 replies; 36+ messages in thread
From: Kay Sievers @ 2014-04-16 3:33 UTC (permalink / raw)
To: Li Zefan
Cc: Tejun Heo, containers, cgroups, LKML, john, rlove, eparis,
Greg Kroah-Hartman, serge.hallyn, Lennart Poettering
On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan@huawei.com> wrote:
> On 2014/4/15 5:44, Tejun Heo wrote:
>> cgroup users often need a way to determine when a cgroup's
>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>> currently provides release_agent for it; unfortunately, this mechanism
>> is riddled with issues.
>>
>> * It delivers events by forking and execing a userland binary
>> specified as the release_agent. This is a long deprecated method of
>> notification delivery. It's extremely heavy, slow and cumbersome to
>> integrate with larger infrastructure.
>>
>> * There is single monitoring point at the root. There's no way to
>> delegate management of subtree.
>>
>> * The event isn't recursive. It triggers when a cgroup doesn't have
>> any tasks or child cgroups. Events for internal nodes trigger only
>> after all children are removed. This again makes it impossible to
>> delegate management of subtree.
>>
>> * Events are filtered from the kernel side. "notify_on_release" file
>> is used to subscribe to or suppress release event. This is
>> unnecessarily complicated and probably done this way because event
>> delivery itself was expensive.
>>
>> This patch implements interface file "cgroup.subtree_populated" which
>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>> it or not. Its value is 0 if there is no task in the cgroup and its
>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>> triggers when the value changes, which can be monitored through poll
>> and [di]notify.
>>
>
> For the old notification mechanism, the path of the cgroup that becomes
> empty will be passed to the user specified release agent. Like this:
>
> # cat /sbin/cpuset_release_agent
> #!/bin/sh
> rmdir /dev/cpuset/$1
>
> How do we achieve this using inotify?
>
> - monitor all the cgroups, or
> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
> empty cgroups.
> - monitor root cgroup only, and travel the whole hierarchy to find
> empy cgroups when it gets an fs event.
>
> Seems none of them is scalible.
The manager would add all cgroups as watches to one inotify file
descriptor, it should not be problem to do that.
Kay
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <CAPXgP12kvPdX0QExwN2JphDfEW=d+7K2c_Y8DbomGd=YVy=VGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
[not found] ` <CAPXgP12kvPdX0QExwN2JphDfEW=d+7K2c_Y8DbomGd=YVy=VGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-04-16 3:50 ` Eric W. Biederman
2014-04-16 3:50 ` Eric W. Biederman
2014-04-16 4:16 ` Li Zefan
2 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2014-04-16 3:50 UTC (permalink / raw)
To: Kay Sievers
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg, Greg Kroah-Hartman,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, LKML, Lennart Poettering,
cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
eparis-FjpueFixGhCM4zKIHC2jIg,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
Kay Sievers <kay-tD+1rO4QERM@public.gmane.org> writes:
> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>> On 2014/4/15 5:44, Tejun Heo wrote:
>>> cgroup users often need a way to determine when a cgroup's
>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>> currently provides release_agent for it; unfortunately, this mechanism
>>> is riddled with issues.
>>>
>>> * It delivers events by forking and execing a userland binary
>>> specified as the release_agent. This is a long deprecated method of
>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>> integrate with larger infrastructure.
>>>
>>> * There is single monitoring point at the root. There's no way to
>>> delegate management of subtree.
>>>
>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>> any tasks or child cgroups. Events for internal nodes trigger only
>>> after all children are removed. This again makes it impossible to
>>> delegate management of subtree.
>>>
>>> * Events are filtered from the kernel side. "notify_on_release" file
>>> is used to subscribe to or suppress release event. This is
>>> unnecessarily complicated and probably done this way because event
>>> delivery itself was expensive.
>>>
>>> This patch implements interface file "cgroup.subtree_populated" which
>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>> triggers when the value changes, which can be monitored through poll
>>> and [di]notify.
>>>
>>
>> For the old notification mechanism, the path of the cgroup that becomes
>> empty will be passed to the user specified release agent. Like this:
>>
>> # cat /sbin/cpuset_release_agent
>> #!/bin/sh
>> rmdir /dev/cpuset/$1
>>
>> How do we achieve this using inotify?
>>
>> - monitor all the cgroups, or
>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>> empty cgroups.
>> - monitor root cgroup only, and travel the whole hierarchy to find
>> empy cgroups when it gets an fs event.
>>
>> Seems none of them is scalible.
>
> The manager would add all cgroups as watches to one inotify file
> descriptor, it should not be problem to do that.
inotify won't work on cgroupfs.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-16 3:33 ` Kay Sievers
@ 2014-04-16 3:50 ` Eric W. Biederman
-1 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2014-04-16 3:50 UTC (permalink / raw)
To: Kay Sievers
Cc: Li Zefan, rlove-L7G0xEPcOZbYtjvyW6yDsg, Greg Kroah-Hartman,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, LKML, Lennart Poettering,
eparis-FjpueFixGhCM4zKIHC2jIg, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
Kay Sievers <kay-tD+1rO4QERM@public.gmane.org> writes:
> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>> On 2014/4/15 5:44, Tejun Heo wrote:
>>> cgroup users often need a way to determine when a cgroup's
>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>> currently provides release_agent for it; unfortunately, this mechanism
>>> is riddled with issues.
>>>
>>> * It delivers events by forking and execing a userland binary
>>> specified as the release_agent. This is a long deprecated method of
>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>> integrate with larger infrastructure.
>>>
>>> * There is single monitoring point at the root. There's no way to
>>> delegate management of subtree.
>>>
>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>> any tasks or child cgroups. Events for internal nodes trigger only
>>> after all children are removed. This again makes it impossible to
>>> delegate management of subtree.
>>>
>>> * Events are filtered from the kernel side. "notify_on_release" file
>>> is used to subscribe to or suppress release event. This is
>>> unnecessarily complicated and probably done this way because event
>>> delivery itself was expensive.
>>>
>>> This patch implements interface file "cgroup.subtree_populated" which
>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>> triggers when the value changes, which can be monitored through poll
>>> and [di]notify.
>>>
>>
>> For the old notification mechanism, the path of the cgroup that becomes
>> empty will be passed to the user specified release agent. Like this:
>>
>> # cat /sbin/cpuset_release_agent
>> #!/bin/sh
>> rmdir /dev/cpuset/$1
>>
>> How do we achieve this using inotify?
>>
>> - monitor all the cgroups, or
>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>> empty cgroups.
>> - monitor root cgroup only, and travel the whole hierarchy to find
>> empy cgroups when it gets an fs event.
>>
>> Seems none of them is scalible.
>
> The manager would add all cgroups as watches to one inotify file
> descriptor, it should not be problem to do that.
inotify won't work on cgroupfs.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-16 3:50 ` Eric W. Biederman
0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2014-04-16 3:50 UTC (permalink / raw)
To: Kay Sievers
Cc: Li Zefan, rlove, Greg Kroah-Hartman, containers, serge.hallyn,
LKML, Lennart Poettering, eparis, Tejun Heo, cgroups, john
Kay Sievers <kay@vrfy.org> writes:
> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan@huawei.com> wrote:
>> On 2014/4/15 5:44, Tejun Heo wrote:
>>> cgroup users often need a way to determine when a cgroup's
>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>> currently provides release_agent for it; unfortunately, this mechanism
>>> is riddled with issues.
>>>
>>> * It delivers events by forking and execing a userland binary
>>> specified as the release_agent. This is a long deprecated method of
>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>> integrate with larger infrastructure.
>>>
>>> * There is single monitoring point at the root. There's no way to
>>> delegate management of subtree.
>>>
>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>> any tasks or child cgroups. Events for internal nodes trigger only
>>> after all children are removed. This again makes it impossible to
>>> delegate management of subtree.
>>>
>>> * Events are filtered from the kernel side. "notify_on_release" file
>>> is used to subscribe to or suppress release event. This is
>>> unnecessarily complicated and probably done this way because event
>>> delivery itself was expensive.
>>>
>>> This patch implements interface file "cgroup.subtree_populated" which
>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>> triggers when the value changes, which can be monitored through poll
>>> and [di]notify.
>>>
>>
>> For the old notification mechanism, the path of the cgroup that becomes
>> empty will be passed to the user specified release agent. Like this:
>>
>> # cat /sbin/cpuset_release_agent
>> #!/bin/sh
>> rmdir /dev/cpuset/$1
>>
>> How do we achieve this using inotify?
>>
>> - monitor all the cgroups, or
>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>> empty cgroups.
>> - monitor root cgroup only, and travel the whole hierarchy to find
>> empy cgroups when it gets an fs event.
>>
>> Seems none of them is scalible.
>
> The manager would add all cgroups as watches to one inotify file
> descriptor, it should not be problem to do that.
inotify won't work on cgroupfs.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <87tx9uhr0j.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-16 3:50 ` Eric W. Biederman
@ 2014-04-16 4:15 ` Kay Sievers
-1 siblings, 0 replies; 36+ messages in thread
From: Kay Sievers @ 2014-04-16 4:15 UTC (permalink / raw)
To: Eric W. Biederman
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg, Greg Kroah-Hartman,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, LKML, Lennart Poettering,
cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
eparis-FjpueFixGhCM4zKIHC2jIg,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
On Tue, Apr 15, 2014 at 8:50 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Kay Sievers <kay-tD+1rO4QERM@public.gmane.org> writes:
>
>> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>>> On 2014/4/15 5:44, Tejun Heo wrote:
>>>> cgroup users often need a way to determine when a cgroup's
>>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>>> currently provides release_agent for it; unfortunately, this mechanism
>>>> is riddled with issues.
>>>>
>>>> * It delivers events by forking and execing a userland binary
>>>> specified as the release_agent. This is a long deprecated method of
>>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>>> integrate with larger infrastructure.
>>>>
>>>> * There is single monitoring point at the root. There's no way to
>>>> delegate management of subtree.
>>>>
>>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>>> any tasks or child cgroups. Events for internal nodes trigger only
>>>> after all children are removed. This again makes it impossible to
>>>> delegate management of subtree.
>>>>
>>>> * Events are filtered from the kernel side. "notify_on_release" file
>>>> is used to subscribe to or suppress release event. This is
>>>> unnecessarily complicated and probably done this way because event
>>>> delivery itself was expensive.
>>>>
>>>> This patch implements interface file "cgroup.subtree_populated" which
>>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>>> triggers when the value changes, which can be monitored through poll
>>>> and [di]notify.
>>>>
>>>
>>> For the old notification mechanism, the path of the cgroup that becomes
>>> empty will be passed to the user specified release agent. Like this:
>>>
>>> # cat /sbin/cpuset_release_agent
>>> #!/bin/sh
>>> rmdir /dev/cpuset/$1
>>>
>>> How do we achieve this using inotify?
>>>
>>> - monitor all the cgroups, or
>>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>> empty cgroups.
>>> - monitor root cgroup only, and travel the whole hierarchy to find
>>> empy cgroups when it gets an fs event.
>>>
>>> Seems none of them is scalible.
>>
>> The manager would add all cgroups as watches to one inotify file
>> descriptor, it should not be problem to do that.
>
> inotify won't work on cgroupfs.
Inotify on kernfs will work.
Kay
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-16 4:15 ` Kay Sievers
0 siblings, 0 replies; 36+ messages in thread
From: Kay Sievers @ 2014-04-16 4:15 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Li Zefan, rlove, Greg Kroah-Hartman, containers, serge.hallyn,
LKML, Lennart Poettering, eparis, Tejun Heo, cgroups, john
On Tue, Apr 15, 2014 at 8:50 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Kay Sievers <kay@vrfy.org> writes:
>
>> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan@huawei.com> wrote:
>>> On 2014/4/15 5:44, Tejun Heo wrote:
>>>> cgroup users often need a way to determine when a cgroup's
>>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>>> currently provides release_agent for it; unfortunately, this mechanism
>>>> is riddled with issues.
>>>>
>>>> * It delivers events by forking and execing a userland binary
>>>> specified as the release_agent. This is a long deprecated method of
>>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>>> integrate with larger infrastructure.
>>>>
>>>> * There is single monitoring point at the root. There's no way to
>>>> delegate management of subtree.
>>>>
>>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>>> any tasks or child cgroups. Events for internal nodes trigger only
>>>> after all children are removed. This again makes it impossible to
>>>> delegate management of subtree.
>>>>
>>>> * Events are filtered from the kernel side. "notify_on_release" file
>>>> is used to subscribe to or suppress release event. This is
>>>> unnecessarily complicated and probably done this way because event
>>>> delivery itself was expensive.
>>>>
>>>> This patch implements interface file "cgroup.subtree_populated" which
>>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>>> triggers when the value changes, which can be monitored through poll
>>>> and [di]notify.
>>>>
>>>
>>> For the old notification mechanism, the path of the cgroup that becomes
>>> empty will be passed to the user specified release agent. Like this:
>>>
>>> # cat /sbin/cpuset_release_agent
>>> #!/bin/sh
>>> rmdir /dev/cpuset/$1
>>>
>>> How do we achieve this using inotify?
>>>
>>> - monitor all the cgroups, or
>>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>> empty cgroups.
>>> - monitor root cgroup only, and travel the whole hierarchy to find
>>> empy cgroups when it gets an fs event.
>>>
>>> Seems none of them is scalible.
>>
>> The manager would add all cgroups as watches to one inotify file
>> descriptor, it should not be problem to do that.
>
> inotify won't work on cgroupfs.
Inotify on kernfs will work.
Kay
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-16 3:50 ` Eric W. Biederman
@ 2014-04-16 4:20 ` Li Zefan
-1 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 4:20 UTC (permalink / raw)
To: Eric W. Biederman
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg, Greg Kroah-Hartman,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, Kay Sievers, LKML,
Lennart Poettering, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
eparis-FjpueFixGhCM4zKIHC2jIg,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
On 2014/4/16 11:50, Eric W. Biederman wrote:
> Kay Sievers <kay-tD+1rO4QERM@public.gmane.org> writes:
>
>> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>>> On 2014/4/15 5:44, Tejun Heo wrote:
>>>> cgroup users often need a way to determine when a cgroup's
>>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>>> currently provides release_agent for it; unfortunately, this mechanism
>>>> is riddled with issues.
>>>>
>>>> * It delivers events by forking and execing a userland binary
>>>> specified as the release_agent. This is a long deprecated method of
>>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>>> integrate with larger infrastructure.
>>>>
>>>> * There is single monitoring point at the root. There's no way to
>>>> delegate management of subtree.
>>>>
>>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>>> any tasks or child cgroups. Events for internal nodes trigger only
>>>> after all children are removed. This again makes it impossible to
>>>> delegate management of subtree.
>>>>
>>>> * Events are filtered from the kernel side. "notify_on_release" file
>>>> is used to subscribe to or suppress release event. This is
>>>> unnecessarily complicated and probably done this way because event
>>>> delivery itself was expensive.
>>>>
>>>> This patch implements interface file "cgroup.subtree_populated" which
>>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>>> triggers when the value changes, which can be monitored through poll
>>>> and [di]notify.
>>>>
>>>
>>> For the old notification mechanism, the path of the cgroup that becomes
>>> empty will be passed to the user specified release agent. Like this:
>>>
>>> # cat /sbin/cpuset_release_agent
>>> #!/bin/sh
>>> rmdir /dev/cpuset/$1
>>>
>>> How do we achieve this using inotify?
>>>
>>> - monitor all the cgroups, or
>>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>> empty cgroups.
>>> - monitor root cgroup only, and travel the whole hierarchy to find
>>> empy cgroups when it gets an fs event.
>>>
>>> Seems none of them is scalible.
>>
>> The manager would add all cgroups as watches to one inotify file
>> descriptor, it should not be problem to do that.
>
> inotify won't work on cgroupfs.
>
Tejun's working on inotify support for cgroupfs, and I believe this patchset
has been tested, so it works.
So what do you mean by saying it won't work? Could you be more specific?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-16 4:20 ` Li Zefan
0 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 4:20 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Kay Sievers, rlove, Greg Kroah-Hartman, containers, serge.hallyn,
LKML, Lennart Poettering, eparis, Tejun Heo, cgroups, john
On 2014/4/16 11:50, Eric W. Biederman wrote:
> Kay Sievers <kay@vrfy.org> writes:
>
>> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan@huawei.com> wrote:
>>> On 2014/4/15 5:44, Tejun Heo wrote:
>>>> cgroup users often need a way to determine when a cgroup's
>>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>>> currently provides release_agent for it; unfortunately, this mechanism
>>>> is riddled with issues.
>>>>
>>>> * It delivers events by forking and execing a userland binary
>>>> specified as the release_agent. This is a long deprecated method of
>>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>>> integrate with larger infrastructure.
>>>>
>>>> * There is single monitoring point at the root. There's no way to
>>>> delegate management of subtree.
>>>>
>>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>>> any tasks or child cgroups. Events for internal nodes trigger only
>>>> after all children are removed. This again makes it impossible to
>>>> delegate management of subtree.
>>>>
>>>> * Events are filtered from the kernel side. "notify_on_release" file
>>>> is used to subscribe to or suppress release event. This is
>>>> unnecessarily complicated and probably done this way because event
>>>> delivery itself was expensive.
>>>>
>>>> This patch implements interface file "cgroup.subtree_populated" which
>>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>>> triggers when the value changes, which can be monitored through poll
>>>> and [di]notify.
>>>>
>>>
>>> For the old notification mechanism, the path of the cgroup that becomes
>>> empty will be passed to the user specified release agent. Like this:
>>>
>>> # cat /sbin/cpuset_release_agent
>>> #!/bin/sh
>>> rmdir /dev/cpuset/$1
>>>
>>> How do we achieve this using inotify?
>>>
>>> - monitor all the cgroups, or
>>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>> empty cgroups.
>>> - monitor root cgroup only, and travel the whole hierarchy to find
>>> empy cgroups when it gets an fs event.
>>>
>>> Seems none of them is scalible.
>>
>> The manager would add all cgroups as watches to one inotify file
>> descriptor, it should not be problem to do that.
>
> inotify won't work on cgroupfs.
>
Tejun's working on inotify support for cgroupfs, and I believe this patchset
has been tested, so it works.
So what do you mean by saying it won't work? Could you be more specific?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
2014-04-16 3:33 ` Kay Sievers
@ 2014-04-16 4:16 ` Li Zefan
-1 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 4:16 UTC (permalink / raw)
To: Kay Sievers
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg, Greg Kroah-Hartman,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, LKML, Lennart Poettering,
eparis-FjpueFixGhCM4zKIHC2jIg, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
On 2014/4/16 11:33, Kay Sievers wrote:
> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>> On 2014/4/15 5:44, Tejun Heo wrote:
>>> cgroup users often need a way to determine when a cgroup's
>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>> currently provides release_agent for it; unfortunately, this mechanism
>>> is riddled with issues.
>>>
>>> * It delivers events by forking and execing a userland binary
>>> specified as the release_agent. This is a long deprecated method of
>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>> integrate with larger infrastructure.
>>>
>>> * There is single monitoring point at the root. There's no way to
>>> delegate management of subtree.
>>>
>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>> any tasks or child cgroups. Events for internal nodes trigger only
>>> after all children are removed. This again makes it impossible to
>>> delegate management of subtree.
>>>
>>> * Events are filtered from the kernel side. "notify_on_release" file
>>> is used to subscribe to or suppress release event. This is
>>> unnecessarily complicated and probably done this way because event
>>> delivery itself was expensive.
>>>
>>> This patch implements interface file "cgroup.subtree_populated" which
>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>> triggers when the value changes, which can be monitored through poll
>>> and [di]notify.
>>>
>>
>> For the old notification mechanism, the path of the cgroup that becomes
>> empty will be passed to the user specified release agent. Like this:
>>
>> # cat /sbin/cpuset_release_agent
>> #!/bin/sh
>> rmdir /dev/cpuset/$1
>>
>> How do we achieve this using inotify?
>>
>> - monitor all the cgroups, or
>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>> empty cgroups.
>> - monitor root cgroup only, and travel the whole hierarchy to find
>> empy cgroups when it gets an fs event.
>>
>> Seems none of them is scalible.
>
> The manager would add all cgroups as watches to one inotify file
> descriptor, it should not be problem to do that.
>
Never use inotify. Thanks for explanation, so I think inotify can scale
to thounsands of cgroups after I googled a bit.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
@ 2014-04-16 4:16 ` Li Zefan
0 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-16 4:16 UTC (permalink / raw)
To: Kay Sievers
Cc: Tejun Heo, containers, cgroups, LKML, john, rlove, eparis,
Greg Kroah-Hartman, serge.hallyn, Lennart Poettering
On 2014/4/16 11:33, Kay Sievers wrote:
> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan <lizefan@huawei.com> wrote:
>> On 2014/4/15 5:44, Tejun Heo wrote:
>>> cgroup users often need a way to determine when a cgroup's
>>> subhierarchy becomes empty so that it can be cleaned up. cgroup
>>> currently provides release_agent for it; unfortunately, this mechanism
>>> is riddled with issues.
>>>
>>> * It delivers events by forking and execing a userland binary
>>> specified as the release_agent. This is a long deprecated method of
>>> notification delivery. It's extremely heavy, slow and cumbersome to
>>> integrate with larger infrastructure.
>>>
>>> * There is single monitoring point at the root. There's no way to
>>> delegate management of subtree.
>>>
>>> * The event isn't recursive. It triggers when a cgroup doesn't have
>>> any tasks or child cgroups. Events for internal nodes trigger only
>>> after all children are removed. This again makes it impossible to
>>> delegate management of subtree.
>>>
>>> * Events are filtered from the kernel side. "notify_on_release" file
>>> is used to subscribe to or suppress release event. This is
>>> unnecessarily complicated and probably done this way because event
>>> delivery itself was expensive.
>>>
>>> This patch implements interface file "cgroup.subtree_populated" which
>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>> it or not. Its value is 0 if there is no task in the cgroup and its
>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>> triggers when the value changes, which can be monitored through poll
>>> and [di]notify.
>>>
>>
>> For the old notification mechanism, the path of the cgroup that becomes
>> empty will be passed to the user specified release agent. Like this:
>>
>> # cat /sbin/cpuset_release_agent
>> #!/bin/sh
>> rmdir /dev/cpuset/$1
>>
>> How do we achieve this using inotify?
>>
>> - monitor all the cgroups, or
>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>> empty cgroups.
>> - monitor root cgroup only, and travel the whole hierarchy to find
>> empy cgroups when it gets an fs event.
>>
>> Seems none of them is scalible.
>
> The manager would add all cgroups as watches to one inotify file
> descriptor, it should not be problem to do that.
>
Never use inotify. Thanks for explanation, so I think inotify can scale
to thounsands of cgroups after I googled a bit.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 3/3] cgroup: implement cgroup.populated for the default hierarchy
[not found] ` <1397511846-2904-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-15 0:57 ` Li Zefan
2014-04-16 2:48 ` Li Zefan
@ 2014-04-16 14:50 ` Tejun Heo
2014-04-16 14:50 ` Tejun Heo
3 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2014-04-16 14:50 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, kay-tD+1rO4QERM,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, eparis-FjpueFixGhCM4zKIHC2jIg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
Hello,
Renamed to cgroup.populated. git branch updated accordingly.
Thanks.
------- 8< -------
From bf3fd307267ce307c4c8ec41834d7bddb6441907 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Wed, 16 Apr 2014 10:48:38 -0400
cgroup users often need a way to determine when a cgroup's
subhierarchy becomes empty so that it can be cleaned up. cgroup
currently provides release_agent for it; unfortunately, this mechanism
is riddled with issues.
* It delivers events by forking and execing a userland binary
specified as the release_agent. This is a long deprecated method of
notification delivery. It's extremely heavy, slow and cumbersome to
integrate with larger infrastructure.
* There is single monitoring point at the root. There's no way to
delegate management of a subtree.
* The event isn't recursive. It triggers when a cgroup doesn't have
any tasks or child cgroups. Events for internal nodes trigger only
after all children are removed. This again makes it impossible to
delegate management of a subtree.
* Events are filtered from the kernel side. "notify_on_release" file
is used to subscribe to or suppress release event. This is
unnecessarily complicated and probably done this way because event
delivery itself was expensive.
This patch implements interface file "cgroup.populated" which can be
used to monitor whether the cgroup's subhierarchy has tasks in it or
not. Its value is 0 if there is no task in the cgroup and its
descendants; otherwise, 1, and kernfs_notify() notificaiton is
triggers when the value changes, which can be monitored through poll
and [di]notify.
This is a lot ligther and simpler and trivially allows delegating
management of subhierarchy - subhierarchy monitoring can block further
propgation simply by putting itself or another process in the root of
the subhierarchy and monitor events that it's interested in from there
without interfering with monitoring higher in the tree.
v2: Patch description updated as per Serge.
v3: "cgroup.subtree_populated" renamed to "cgroup.populated". The
subtree_ prefix was a bit confusing because
"cgroup.subtree_control" uses it to denote the tree rooted at the
cgroup sans the cgroup itself while the populated state includes
the cgroup itself.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
Cc: Lennart Poettering <lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org>
---
include/linux/cgroup.h | 15 ++++++++++++
kernel/cgroup.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 76 insertions(+), 4 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ada2392..4b38e2d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -154,6 +154,14 @@ struct cgroup {
/* the number of attached css's */
int nr_css;
+ /*
+ * If this cgroup contains any tasks, it contributes one to
+ * populated_cnt. All children with non-zero popuplated_cnt of
+ * their own contribute one. The count is zero iff there's no task
+ * in this cgroup or its subtree.
+ */
+ int populated_cnt;
+
atomic_t refcnt;
/*
@@ -166,6 +174,7 @@ struct cgroup {
struct cgroup *parent; /* my parent */
struct kernfs_node *kn; /* cgroup kernfs entry */
struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */
+ struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
/*
* Monotonically increasing unique serial number which defines a
@@ -264,6 +273,12 @@ enum {
*
* - "cgroup.clone_children" is removed.
*
+ * - "cgroup.subtree_populated" is available. Its value is 0 if
+ * the cgroup and its descendants contain no task; otherwise, 1.
+ * The file also generates kernfs notification which can be
+ * monitored through poll and [di]notify when the value of the
+ * file changes.
+ *
* - If mount is requested with sane_behavior but without any
* subsystem, the default unified hierarchy is mounted.
*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7986aa6..2412cb7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -411,6 +411,43 @@ static struct css_set init_css_set = {
static int css_set_count = 1; /* 1 for init_css_set */
+/**
+ * cgroup_update_populated - updated populated count of a cgroup
+ * @cgrp: the target cgroup
+ * @populated: inc or dec populated count
+ *
+ * @cgrp is either getting the first task (css_set) or losing the last.
+ * Update @cgrp->populated_cnt accordingly. The count is propagated
+ * towards root so that a given cgroup's populated_cnt is zero iff the
+ * cgroup and all its descendants are empty.
+ *
+ * @cgrp's interface file "cgroup.populated" is zero if
+ * @cgrp->populated_cnt is zero and 1 otherwise. When @cgrp->populated_cnt
+ * changes from or to zero, userland is notified that the content of the
+ * interface file has changed. This can be used to detect when @cgrp and
+ * its descendants become populated or empty.
+ */
+static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
+{
+ lockdep_assert_held(&css_set_rwsem);
+
+ do {
+ bool trigger;
+
+ if (populated)
+ trigger = !cgrp->populated_cnt++;
+ else
+ trigger = !--cgrp->populated_cnt;
+
+ if (!trigger)
+ break;
+
+ if (cgrp->populated_kn)
+ kernfs_notify(cgrp->populated_kn);
+ cgrp = cgrp->parent;
+ } while (cgrp);
+}
+
/*
* hash table for cgroup groups. This improves the performance to find
* an existing css_set. This hash doesn't (currently) take into
@@ -456,10 +493,13 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit)
list_del(&link->cgrp_link);
/* @cgrp can't go away while we're holding css_set_rwsem */
- if (list_empty(&cgrp->cset_links) && notify_on_release(cgrp)) {
- if (taskexit)
- set_bit(CGRP_RELEASABLE, &cgrp->flags);
- check_for_release(cgrp);
+ if (list_empty(&cgrp->cset_links)) {
+ cgroup_update_populated(cgrp, false);
+ if (notify_on_release(cgrp)) {
+ if (taskexit)
+ set_bit(CGRP_RELEASABLE, &cgrp->flags);
+ check_for_release(cgrp);
+ }
}
kfree(link);
@@ -668,7 +708,11 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link);
link->cset = cset;
link->cgrp = cgrp;
+
+ if (list_empty(&cgrp->cset_links))
+ cgroup_update_populated(cgrp, true);
list_move(&link->cset_link, &cgrp->cset_links);
+
/*
* Always add links to the tail of the list so that the list
* is sorted by order of hierarchy creation
@@ -2643,6 +2687,12 @@ err_undo_css:
goto out_unlock;
}
+static int cgroup_populated_show(struct seq_file *seq, void *v)
+{
+ seq_printf(seq, "%d\n", (bool)seq_css(seq)->cgroup->populated_cnt);
+ return 0;
+}
+
static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
size_t nbytes, loff_t off)
{
@@ -2809,6 +2859,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
if (cft->seq_show == cgroup_subtree_control_show)
cgrp->control_kn = kn;
+ else if (cft->seq_show == cgroup_populated_show)
+ cgrp->populated_kn = kn;
return 0;
}
@@ -3918,6 +3970,11 @@ static struct cftype cgroup_base_files[] = {
.seq_show = cgroup_subtree_control_show,
.write_string = cgroup_subtree_control_write,
},
+ {
+ .name = "cgroup.populated",
+ .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
+ .seq_show = cgroup_populated_show,
+ },
/*
* Historical crazy stuff. These don't have "cgroup." prefix and
--
1.9.0
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v3 3/3] cgroup: implement cgroup.populated for the default hierarchy
2014-04-14 21:44 ` Tejun Heo
@ 2014-04-16 14:50 ` Tejun Heo
-1 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2014-04-16 14:50 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf,
rlove-L7G0xEPcOZbYtjvyW6yDsg, eparis-FjpueFixGhCM4zKIHC2jIg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, kay-tD+1rO4QERM
Hello,
Renamed to cgroup.populated. git branch updated accordingly.
Thanks.
------- 8< -------
From bf3fd307267ce307c4c8ec41834d7bddb6441907 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Wed, 16 Apr 2014 10:48:38 -0400
cgroup users often need a way to determine when a cgroup's
subhierarchy becomes empty so that it can be cleaned up. cgroup
currently provides release_agent for it; unfortunately, this mechanism
is riddled with issues.
* It delivers events by forking and execing a userland binary
specified as the release_agent. This is a long deprecated method of
notification delivery. It's extremely heavy, slow and cumbersome to
integrate with larger infrastructure.
* There is single monitoring point at the root. There's no way to
delegate management of a subtree.
* The event isn't recursive. It triggers when a cgroup doesn't have
any tasks or child cgroups. Events for internal nodes trigger only
after all children are removed. This again makes it impossible to
delegate management of a subtree.
* Events are filtered from the kernel side. "notify_on_release" file
is used to subscribe to or suppress release event. This is
unnecessarily complicated and probably done this way because event
delivery itself was expensive.
This patch implements interface file "cgroup.populated" which can be
used to monitor whether the cgroup's subhierarchy has tasks in it or
not. Its value is 0 if there is no task in the cgroup and its
descendants; otherwise, 1, and kernfs_notify() notificaiton is
triggers when the value changes, which can be monitored through poll
and [di]notify.
This is a lot ligther and simpler and trivially allows delegating
management of subhierarchy - subhierarchy monitoring can block further
propgation simply by putting itself or another process in the root of
the subhierarchy and monitor events that it's interested in from there
without interfering with monitoring higher in the tree.
v2: Patch description updated as per Serge.
v3: "cgroup.subtree_populated" renamed to "cgroup.populated". The
subtree_ prefix was a bit confusing because
"cgroup.subtree_control" uses it to denote the tree rooted at the
cgroup sans the cgroup itself while the populated state includes
the cgroup itself.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
Cc: Lennart Poettering <lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org>
---
include/linux/cgroup.h | 15 ++++++++++++
kernel/cgroup.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 76 insertions(+), 4 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ada2392..4b38e2d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -154,6 +154,14 @@ struct cgroup {
/* the number of attached css's */
int nr_css;
+ /*
+ * If this cgroup contains any tasks, it contributes one to
+ * populated_cnt. All children with non-zero popuplated_cnt of
+ * their own contribute one. The count is zero iff there's no task
+ * in this cgroup or its subtree.
+ */
+ int populated_cnt;
+
atomic_t refcnt;
/*
@@ -166,6 +174,7 @@ struct cgroup {
struct cgroup *parent; /* my parent */
struct kernfs_node *kn; /* cgroup kernfs entry */
struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */
+ struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
/*
* Monotonically increasing unique serial number which defines a
@@ -264,6 +273,12 @@ enum {
*
* - "cgroup.clone_children" is removed.
*
+ * - "cgroup.subtree_populated" is available. Its value is 0 if
+ * the cgroup and its descendants contain no task; otherwise, 1.
+ * The file also generates kernfs notification which can be
+ * monitored through poll and [di]notify when the value of the
+ * file changes.
+ *
* - If mount is requested with sane_behavior but without any
* subsystem, the default unified hierarchy is mounted.
*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7986aa6..2412cb7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -411,6 +411,43 @@ static struct css_set init_css_set = {
static int css_set_count = 1; /* 1 for init_css_set */
+/**
+ * cgroup_update_populated - updated populated count of a cgroup
+ * @cgrp: the target cgroup
+ * @populated: inc or dec populated count
+ *
+ * @cgrp is either getting the first task (css_set) or losing the last.
+ * Update @cgrp->populated_cnt accordingly. The count is propagated
+ * towards root so that a given cgroup's populated_cnt is zero iff the
+ * cgroup and all its descendants are empty.
+ *
+ * @cgrp's interface file "cgroup.populated" is zero if
+ * @cgrp->populated_cnt is zero and 1 otherwise. When @cgrp->populated_cnt
+ * changes from or to zero, userland is notified that the content of the
+ * interface file has changed. This can be used to detect when @cgrp and
+ * its descendants become populated or empty.
+ */
+static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
+{
+ lockdep_assert_held(&css_set_rwsem);
+
+ do {
+ bool trigger;
+
+ if (populated)
+ trigger = !cgrp->populated_cnt++;
+ else
+ trigger = !--cgrp->populated_cnt;
+
+ if (!trigger)
+ break;
+
+ if (cgrp->populated_kn)
+ kernfs_notify(cgrp->populated_kn);
+ cgrp = cgrp->parent;
+ } while (cgrp);
+}
+
/*
* hash table for cgroup groups. This improves the performance to find
* an existing css_set. This hash doesn't (currently) take into
@@ -456,10 +493,13 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit)
list_del(&link->cgrp_link);
/* @cgrp can't go away while we're holding css_set_rwsem */
- if (list_empty(&cgrp->cset_links) && notify_on_release(cgrp)) {
- if (taskexit)
- set_bit(CGRP_RELEASABLE, &cgrp->flags);
- check_for_release(cgrp);
+ if (list_empty(&cgrp->cset_links)) {
+ cgroup_update_populated(cgrp, false);
+ if (notify_on_release(cgrp)) {
+ if (taskexit)
+ set_bit(CGRP_RELEASABLE, &cgrp->flags);
+ check_for_release(cgrp);
+ }
}
kfree(link);
@@ -668,7 +708,11 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link);
link->cset = cset;
link->cgrp = cgrp;
+
+ if (list_empty(&cgrp->cset_links))
+ cgroup_update_populated(cgrp, true);
list_move(&link->cset_link, &cgrp->cset_links);
+
/*
* Always add links to the tail of the list so that the list
* is sorted by order of hierarchy creation
@@ -2643,6 +2687,12 @@ err_undo_css:
goto out_unlock;
}
+static int cgroup_populated_show(struct seq_file *seq, void *v)
+{
+ seq_printf(seq, "%d\n", (bool)seq_css(seq)->cgroup->populated_cnt);
+ return 0;
+}
+
static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
size_t nbytes, loff_t off)
{
@@ -2809,6 +2859,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
if (cft->seq_show == cgroup_subtree_control_show)
cgrp->control_kn = kn;
+ else if (cft->seq_show == cgroup_populated_show)
+ cgrp->populated_kn = kn;
return 0;
}
@@ -3918,6 +3970,11 @@ static struct cftype cgroup_base_files[] = {
.seq_show = cgroup_subtree_control_show,
.write_string = cgroup_subtree_control_write,
},
+ {
+ .name = "cgroup.populated",
+ .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
+ .seq_show = cgroup_populated_show,
+ },
/*
* Historical crazy stuff. These don't have "cgroup." prefix and
--
1.9.0
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v3 3/3] cgroup: implement cgroup.populated for the default hierarchy
@ 2014-04-16 14:50 ` Tejun Heo
0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2014-04-16 14:50 UTC (permalink / raw)
To: lizefan
Cc: containers, cgroups, linux-kernel, john, rlove, eparis, gregkh,
serge.hallyn, lennart, kay
Hello,
Renamed to cgroup.populated. git branch updated accordingly.
Thanks.
------- 8< -------
>From bf3fd307267ce307c4c8ec41834d7bddb6441907 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Wed, 16 Apr 2014 10:48:38 -0400
cgroup users often need a way to determine when a cgroup's
subhierarchy becomes empty so that it can be cleaned up. cgroup
currently provides release_agent for it; unfortunately, this mechanism
is riddled with issues.
* It delivers events by forking and execing a userland binary
specified as the release_agent. This is a long deprecated method of
notification delivery. It's extremely heavy, slow and cumbersome to
integrate with larger infrastructure.
* There is single monitoring point at the root. There's no way to
delegate management of a subtree.
* The event isn't recursive. It triggers when a cgroup doesn't have
any tasks or child cgroups. Events for internal nodes trigger only
after all children are removed. This again makes it impossible to
delegate management of a subtree.
* Events are filtered from the kernel side. "notify_on_release" file
is used to subscribe to or suppress release event. This is
unnecessarily complicated and probably done this way because event
delivery itself was expensive.
This patch implements interface file "cgroup.populated" which can be
used to monitor whether the cgroup's subhierarchy has tasks in it or
not. Its value is 0 if there is no task in the cgroup and its
descendants; otherwise, 1, and kernfs_notify() notificaiton is
triggers when the value changes, which can be monitored through poll
and [di]notify.
This is a lot ligther and simpler and trivially allows delegating
management of subhierarchy - subhierarchy monitoring can block further
propgation simply by putting itself or another process in the root of
the subhierarchy and monitor events that it's interested in from there
without interfering with monitoring higher in the tree.
v2: Patch description updated as per Serge.
v3: "cgroup.subtree_populated" renamed to "cgroup.populated". The
subtree_ prefix was a bit confusing because
"cgroup.subtree_control" uses it to denote the tree rooted at the
cgroup sans the cgroup itself while the populated state includes
the cgroup itself.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Lennart Poettering <lennart@poettering.net>
---
include/linux/cgroup.h | 15 ++++++++++++
kernel/cgroup.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 76 insertions(+), 4 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ada2392..4b38e2d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -154,6 +154,14 @@ struct cgroup {
/* the number of attached css's */
int nr_css;
+ /*
+ * If this cgroup contains any tasks, it contributes one to
+ * populated_cnt. All children with non-zero popuplated_cnt of
+ * their own contribute one. The count is zero iff there's no task
+ * in this cgroup or its subtree.
+ */
+ int populated_cnt;
+
atomic_t refcnt;
/*
@@ -166,6 +174,7 @@ struct cgroup {
struct cgroup *parent; /* my parent */
struct kernfs_node *kn; /* cgroup kernfs entry */
struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */
+ struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
/*
* Monotonically increasing unique serial number which defines a
@@ -264,6 +273,12 @@ enum {
*
* - "cgroup.clone_children" is removed.
*
+ * - "cgroup.subtree_populated" is available. Its value is 0 if
+ * the cgroup and its descendants contain no task; otherwise, 1.
+ * The file also generates kernfs notification which can be
+ * monitored through poll and [di]notify when the value of the
+ * file changes.
+ *
* - If mount is requested with sane_behavior but without any
* subsystem, the default unified hierarchy is mounted.
*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7986aa6..2412cb7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -411,6 +411,43 @@ static struct css_set init_css_set = {
static int css_set_count = 1; /* 1 for init_css_set */
+/**
+ * cgroup_update_populated - updated populated count of a cgroup
+ * @cgrp: the target cgroup
+ * @populated: inc or dec populated count
+ *
+ * @cgrp is either getting the first task (css_set) or losing the last.
+ * Update @cgrp->populated_cnt accordingly. The count is propagated
+ * towards root so that a given cgroup's populated_cnt is zero iff the
+ * cgroup and all its descendants are empty.
+ *
+ * @cgrp's interface file "cgroup.populated" is zero if
+ * @cgrp->populated_cnt is zero and 1 otherwise. When @cgrp->populated_cnt
+ * changes from or to zero, userland is notified that the content of the
+ * interface file has changed. This can be used to detect when @cgrp and
+ * its descendants become populated or empty.
+ */
+static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
+{
+ lockdep_assert_held(&css_set_rwsem);
+
+ do {
+ bool trigger;
+
+ if (populated)
+ trigger = !cgrp->populated_cnt++;
+ else
+ trigger = !--cgrp->populated_cnt;
+
+ if (!trigger)
+ break;
+
+ if (cgrp->populated_kn)
+ kernfs_notify(cgrp->populated_kn);
+ cgrp = cgrp->parent;
+ } while (cgrp);
+}
+
/*
* hash table for cgroup groups. This improves the performance to find
* an existing css_set. This hash doesn't (currently) take into
@@ -456,10 +493,13 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit)
list_del(&link->cgrp_link);
/* @cgrp can't go away while we're holding css_set_rwsem */
- if (list_empty(&cgrp->cset_links) && notify_on_release(cgrp)) {
- if (taskexit)
- set_bit(CGRP_RELEASABLE, &cgrp->flags);
- check_for_release(cgrp);
+ if (list_empty(&cgrp->cset_links)) {
+ cgroup_update_populated(cgrp, false);
+ if (notify_on_release(cgrp)) {
+ if (taskexit)
+ set_bit(CGRP_RELEASABLE, &cgrp->flags);
+ check_for_release(cgrp);
+ }
}
kfree(link);
@@ -668,7 +708,11 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link);
link->cset = cset;
link->cgrp = cgrp;
+
+ if (list_empty(&cgrp->cset_links))
+ cgroup_update_populated(cgrp, true);
list_move(&link->cset_link, &cgrp->cset_links);
+
/*
* Always add links to the tail of the list so that the list
* is sorted by order of hierarchy creation
@@ -2643,6 +2687,12 @@ err_undo_css:
goto out_unlock;
}
+static int cgroup_populated_show(struct seq_file *seq, void *v)
+{
+ seq_printf(seq, "%d\n", (bool)seq_css(seq)->cgroup->populated_cnt);
+ return 0;
+}
+
static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
size_t nbytes, loff_t off)
{
@@ -2809,6 +2859,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
if (cft->seq_show == cgroup_subtree_control_show)
cgrp->control_kn = kn;
+ else if (cft->seq_show == cgroup_populated_show)
+ cgrp->populated_kn = kn;
return 0;
}
@@ -3918,6 +3970,11 @@ static struct cftype cgroup_base_files[] = {
.seq_show = cgroup_subtree_control_show,
.write_string = cgroup_subtree_control_write,
},
+ {
+ .name = "cgroup.populated",
+ .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
+ .seq_show = cgroup_populated_show,
+ },
/*
* Historical crazy stuff. These don't have "cgroup." prefix and
--
1.9.0
^ permalink raw reply related [flat|nested] 36+ messages in thread[parent not found: <20140416145047.GC1257-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH v3 3/3] cgroup: implement cgroup.populated for the default hierarchy
2014-04-16 14:50 ` Tejun Heo
@ 2014-04-17 1:23 ` Li Zefan
-1 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-17 1:23 UTC (permalink / raw)
To: Tejun Heo
Cc: rlove-L7G0xEPcOZbYtjvyW6yDsg,
gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA, kay-tD+1rO4QERM,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lennart-mdGvqq1h2p+GdvJs77BJ7Q, eparis-FjpueFixGhCM4zKIHC2jIg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
john-jueV0HHMeujJJrXXpGQQMAC/G2K4zDHf
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up. cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.
>
> * It delivers events by forking and execing a userland binary
> specified as the release_agent. This is a long deprecated method of
> notification delivery. It's extremely heavy, slow and cumbersome to
> integrate with larger infrastructure.
>
> * There is single monitoring point at the root. There's no way to
> delegate management of a subtree.
>
> * The event isn't recursive. It triggers when a cgroup doesn't have
> any tasks or child cgroups. Events for internal nodes trigger only
> after all children are removed. This again makes it impossible to
> delegate management of a subtree.
>
> * Events are filtered from the kernel side. "notify_on_release" file
> is used to subscribe to or suppress release event. This is
> unnecessarily complicated and probably done this way because event
> delivery itself was expensive.
>
> This patch implements interface file "cgroup.populated" which can be
> used to monitor whether the cgroup's subhierarchy has tasks in it or
> not. Its value is 0 if there is no task in the cgroup and its
> descendants; otherwise, 1, and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
>
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
>
> v2: Patch description updated as per Serge.
>
> v3: "cgroup.subtree_populated" renamed to "cgroup.populated". The
> subtree_ prefix was a bit confusing because
> "cgroup.subtree_control" uses it to denote the tree rooted at the
> cgroup sans the cgroup itself while the populated state includes
> the cgroup itself.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Acked-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> Cc: Lennart Poettering <lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org>
Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 3/3] cgroup: implement cgroup.populated for the default hierarchy
@ 2014-04-17 1:23 ` Li Zefan
0 siblings, 0 replies; 36+ messages in thread
From: Li Zefan @ 2014-04-17 1:23 UTC (permalink / raw)
To: Tejun Heo
Cc: containers, cgroups, linux-kernel, john, rlove, eparis, gregkh,
serge.hallyn, lennart, kay
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up. cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.
>
> * It delivers events by forking and execing a userland binary
> specified as the release_agent. This is a long deprecated method of
> notification delivery. It's extremely heavy, slow and cumbersome to
> integrate with larger infrastructure.
>
> * There is single monitoring point at the root. There's no way to
> delegate management of a subtree.
>
> * The event isn't recursive. It triggers when a cgroup doesn't have
> any tasks or child cgroups. Events for internal nodes trigger only
> after all children are removed. This again makes it impossible to
> delegate management of a subtree.
>
> * Events are filtered from the kernel side. "notify_on_release" file
> is used to subscribe to or suppress release event. This is
> unnecessarily complicated and probably done this way because event
> delivery itself was expensive.
>
> This patch implements interface file "cgroup.populated" which can be
> used to monitor whether the cgroup's subhierarchy has tasks in it or
> not. Its value is 0 if there is no task in the cgroup and its
> descendants; otherwise, 1, and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
>
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
>
> v2: Patch description updated as per Serge.
>
> v3: "cgroup.subtree_populated" renamed to "cgroup.populated". The
> subtree_ prefix was a bit confusing because
> "cgroup.subtree_control" uses it to denote the tree rooted at the
> cgroup sans the cgroup itself while the populated state includes
> the cgroup itself.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
> Cc: Lennart Poettering <lennart@poettering.net>
Acked-by: Li Zefan <lizefan@huawei.com>
^ permalink raw reply [flat|nested] 36+ messages in thread