From: "Serge E. Hallyn" <serge@hallyn.com>
To: Tejun Heo <tj@kernel.org>
Cc: gregkh@linuxfoundation.org, rlove@rlove.org,
containers@lists.linux-foundation.org, serge.hallyn@ubuntu.com,
kay@vrfy.org, linux-kernel@vger.kernel.org,
lennart@poettering.net, cgroups@vger.kernel.org,
eparis@parisplace.org, john@johnmccutchan.com
Subject: Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
Date: Thu, 10 Apr 2014 05:08:55 +0200 [thread overview]
Message-ID: <20140410030855.GA29658@mail.hallyn.com> (raw)
In-Reply-To: <1397056052-2829-4-git-send-email-tj@kernel.org>
Quoting Tejun Heo (tj@kernel.org):
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up. cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.
Thanks, Tejun.
> * It delivers events by forking and execing a userland binary
> specified as the release_agent. This is a long deprecated method of
> notification delivery. It's extremely heavy, slow and cumbersome to
> integrate with larger infrastructure.
(Not seriously worried about this, but it's a point worth considering)
It does have one advantage though: if the userspace agent goes bad,
cgroups can still be removed on empty.
Do you plan on keeping release-on-empty around? I assume only for a
while?
Do you think there is any value in having a simpler "remove-when-empty"
file? Doesn't call out to userspace, just drops the cgroup when there
are no more tasks or sub-cgroups?
> * There is single monitoring point at the root. There's no way to
> delegate management of subtree.
>
> * The event isn't recursive. It triggers when a cgroup doesn't have
> any tasks or child cgroups. Events for internal nodes trigger only
> after all children are removed. This again makes it impossible to
> delegate management of subtree.
>
> * Events are filtered from the kernel side. "notify_on_release" file
> is used to subscribe to or suppres release event and events are not
> generated if a cgroup becomes empty by moving the last task out of
> it; however, event is generated if it becomes empty because the last
> child cgroup is removed. This is inconsistent, awkward and
Hm, maybe I'm misreading but this doesn't seem right. If I move
a task into x1 and kill the task, x1 goes away. Likewise if I
create x1/y1, and rmdir y1, x1 goes away. I suspect I'm misunderstanding
the case in which you say it doesn't happen?
> unnecessarily complicated and probably done this way because event
> delivery itself was expensive.
>
> This patch implements interface file "cgroup.subtree_populated" which
> can be used to monitor whether the cgroup's subhierarchy has tasks in
> it or not. Its value is 1 if there is no task in the cgroup and its
I think you meant this backward? It's 1 if there is *any task in
the cgroup and its descendants, else 0?
> descendants; otherwise, 0, and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
>
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
> Cc: Lennart Poettering <lennart@poettering.net>
> ---
> include/linux/cgroup.h | 15 ++++++++++++
> kernel/cgroup.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 76 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index dee6f3c..e45d87f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -154,6 +154,14 @@ struct cgroup {
> /* the number of attached css's */
> int nr_css;
>
> + /*
> + * If this cgroup contains any tasks, it contributes one to
> + * populated_cnt. All children with non-zero popuplated_cnt of
> + * their own contribute one. The count is zero iff there's no task
> + * in this cgroup or its subtree.
> + */
> + int populated_cnt;
> +
> atomic_t refcnt;
>
> /*
> @@ -166,6 +174,7 @@ struct cgroup {
> struct cgroup *parent; /* my parent */
> struct kernfs_node *kn; /* cgroup kernfs entry */
> struct kernfs_node *control_kn; /* kn for "cgroup.subtree_control" */
> + struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
>
> /*
> * Monotonically increasing unique serial number which defines a
> @@ -264,6 +273,12 @@ enum {
> *
> * - "cgroup.clone_children" is removed.
> *
> + * - "cgroup.subtree_populated" is available. Its value is 0 if
> + * the cgroup and its descendants contain no task; otherwise, 1.
> + * The file also generates kernfs notification which can be
> + * monitored through poll and [di]notify when the value of the
> + * file changes.
> + *
> * - If mount is requested with sane_behavior but without any
> * subsystem, the default unified hierarchy is mounted.
> *
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 4e958c7..17f0a09 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -411,6 +411,43 @@ static struct css_set init_css_set = {
>
> static int css_set_count = 1; /* 1 for init_css_set */
>
> +/**
> + * cgroup_update_populated - updated populated count of a cgroup
> + * @cgrp: the target cgroup
> + * @populated: inc or dec populated count
> + *
> + * @cgrp is either getting the first task (css_set) or losing the last.
> + * Update @cgrp->populated_cnt accordingly. The count is propagated
> + * towards root so that a given cgroup's populated_cnt is zero iff the
> + * cgroup and all its descendants are empty.
> + *
> + * @cgrp's interface file "cgroup.subtree_populated" is zero if
> + * @cgrp->populated_cnt is zero and 1 otherwise. When @cgrp->populated_cnt
> + * changes from or to zero, userland is notified that the content of the
> + * interface file has changed. This can be used to detect when @cgrp and
> + * its descendants become populated or empty.
> + */
> +static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
> +{
> + lockdep_assert_held(&css_set_rwsem);
> +
> + do {
> + bool trigger;
> +
> + if (populated)
> + trigger = !cgrp->populated_cnt++;
> + else
> + trigger = !--cgrp->populated_cnt;
> +
> + if (!trigger)
> + break;
> +
> + if (cgrp->populated_kn)
> + kernfs_notify(cgrp->populated_kn);
> + cgrp = cgrp->parent;
> + } while (cgrp);
> +}
> +
> /*
> * hash table for cgroup groups. This improves the performance to find
> * an existing css_set. This hash doesn't (currently) take into
> @@ -456,10 +493,13 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit)
> list_del(&link->cgrp_link);
>
> /* @cgrp can't go away while we're holding css_set_rwsem */
> - if (list_empty(&cgrp->cset_links) && notify_on_release(cgrp)) {
> - if (taskexit)
> - set_bit(CGRP_RELEASABLE, &cgrp->flags);
> - check_for_release(cgrp);
> + if (list_empty(&cgrp->cset_links)) {
> + cgroup_update_populated(cgrp, false);
> + if (notify_on_release(cgrp)) {
> + if (taskexit)
> + set_bit(CGRP_RELEASABLE, &cgrp->flags);
> + check_for_release(cgrp);
> + }
> }
>
> kfree(link);
> @@ -668,7 +708,11 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
> link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link);
> link->cset = cset;
> link->cgrp = cgrp;
> +
> + if (list_empty(&cgrp->cset_links))
> + cgroup_update_populated(cgrp, true);
> list_move(&link->cset_link, &cgrp->cset_links);
> +
> /*
> * Always add links to the tail of the list so that the list
> * is sorted by order of hierarchy creation
> @@ -2633,6 +2677,12 @@ err_undo_css:
> goto out_unlock;
> }
>
> +static int cgroup_subtree_populated_show(struct seq_file *seq, void *v)
> +{
> + seq_printf(seq, "%d\n", (bool)seq_css(seq)->cgroup->populated_cnt);
> + return 0;
> +}
> +
> static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
> size_t nbytes, loff_t off)
> {
> @@ -2775,6 +2825,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
> NULL, false, key);
> if (cft->seq_show == cgroup_subtree_control_show)
> cgrp->control_kn = kn;
> + else if (cft->seq_show == cgroup_subtree_populated_show)
> + cgrp->populated_kn = kn;
> return PTR_ERR_OR_ZERO(kn);
> }
>
> @@ -3883,6 +3935,11 @@ static struct cftype cgroup_base_files[] = {
> .seq_show = cgroup_subtree_control_show,
> .write_string = cgroup_subtree_control_write,
> },
> + {
> + .name = "cgroup.subtree_populated",
> + .flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
> + .seq_show = cgroup_subtree_populated_show,
> + },
>
> /*
> * Historical crazy stuff. These don't have "cgroup." prefix and
> --
> 1.9.0
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
next prev parent reply other threads:[~2014-04-10 3:08 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-09 15:07 [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated Tejun Heo
[not found] ` <1397056052-2829-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-09 15:07 ` [PATCH 1/3] kernfs: implement kernfs_root->supers list Tejun Heo
2014-04-09 15:07 ` [PATCH 2/3] kernfs: make kernfs_notify() trigger inotify events too Tejun Heo
2014-04-09 15:07 ` [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy Tejun Heo
2014-04-10 3:08 ` Serge E. Hallyn [this message]
[not found] ` <20140410030855.GA29658-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2014-04-10 13:08 ` Tejun Heo
[not found] ` <20140410130831.GA25308-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-10 14:04 ` Serge Hallyn
2014-04-10 14:19 ` Tejun Heo
[not found] ` <20140410141957.GE25308-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-11 9:00 ` Li Zefan
2014-04-14 21:31 ` [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated Tejun Heo
[not found] ` <20140414213100.GA1863-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-14 22:26 ` Greg KH
[not found] ` <20140414222658.GA18152-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-04-15 16:18 ` Tejun Heo
[not found] ` <20140415161828.GA30990-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-23 15:16 ` Tejun Heo
[not found] ` <20140423151638.GG4781-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-25 18:57 ` Greg KH
2014-04-25 22:30 ` Tejun Heo
-- strict thread matches above, loose matches on Subject: below --
2014-04-14 21:44 [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated, v2 Tejun Heo
[not found] ` <1397511846-2904-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-14 21:44 ` [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy Tejun Heo
[not found] ` <1397511846-2904-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-15 0:57 ` Li Zefan
[not found] ` <534C83F1.9020106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-04-15 14:54 ` Tejun Heo
[not found] ` <20140415145450.GL1863-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-15 16:52 ` Tejun Heo
[not found] ` <20140415165221.GD30990-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-16 1:30 ` Li Zefan
2014-04-16 2:48 ` Li Zefan
[not found] ` <534DEF62.4090900-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-04-16 3:33 ` Kay Sievers
[not found] ` <CAPXgP12kvPdX0QExwN2JphDfEW=d+7K2c_Y8DbomGd=YVy=VGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-16 3:50 ` Eric W. Biederman
[not found] ` <87tx9uhr0j.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-04-16 4:15 ` Kay Sievers
2014-04-16 4:20 ` Li Zefan
2014-04-16 4:16 ` Li Zefan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140410030855.GA29658@mail.hallyn.com \
--to=serge@hallyn.com \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=eparis@parisplace.org \
--cc=gregkh@linuxfoundation.org \
--cc=john@johnmccutchan.com \
--cc=kay@vrfy.org \
--cc=lennart@poettering.net \
--cc=linux-kernel@vger.kernel.org \
--cc=rlove@rlove.org \
--cc=serge.hallyn@ubuntu.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).