cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge@hallyn.com>
To: Tejun Heo <tj@kernel.org>
Cc: gregkh@linuxfoundation.org, rlove@rlove.org,
	containers@lists.linux-foundation.org, serge.hallyn@ubuntu.com,
	kay@vrfy.org, linux-kernel@vger.kernel.org,
	lennart@poettering.net, cgroups@vger.kernel.org,
	eparis@parisplace.org, john@johnmccutchan.com
Subject: Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
Date: Thu, 10 Apr 2014 05:08:55 +0200	[thread overview]
Message-ID: <20140410030855.GA29658@mail.hallyn.com> (raw)
In-Reply-To: <1397056052-2829-4-git-send-email-tj@kernel.org>

Quoting Tejun Heo (tj@kernel.org):
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up.  cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.

Thanks, Tejun.

> * It delivers events by forking and execing a userland binary
>   specified as the release_agent.  This is a long deprecated method of
>   notification delivery.  It's extremely heavy, slow and cumbersome to
>   integrate with larger infrastructure.

(Not seriously worried about this, but it's a point worth considering)
It does have one advantage though:  if the userspace agent goes bad,
cgroups can still be removed on empty.

Do you plan on keeping release-on-empty around?  I assume only for a
while?

Do you think there is any value in having a simpler "remove-when-empty"
file?  Doesn't call out to userspace, just drops the cgroup when there
are no more tasks or sub-cgroups?

> * There is single monitoring point at the root.  There's no way to
>   delegate management of subtree.
> 
> * The event isn't recursive.  It triggers when a cgroup doesn't have
>   any tasks or child cgroups.  Events for internal nodes trigger only
>   after all children are removed.  This again makes it impossible to
>   delegate management of subtree.
> 
> * Events are filtered from the kernel side.  "notify_on_release" file
>   is used to subscribe to or suppres release event and events are not
>   generated if a cgroup becomes empty by moving the last task out of
>   it; however, event is generated if it becomes empty because the last
>   child cgroup is removed.  This is inconsistent, awkward and

Hm, maybe I'm misreading but this doesn't seem right.  If I move
a task into x1 and kill the task, x1 goes away.  Likewise if I
create x1/y1, and rmdir y1, x1 goes away.  I suspect I'm misunderstanding
the case in which you say it doesn't happen?

>   unnecessarily complicated and probably done this way because event
>   delivery itself was expensive.
> 
> This patch implements interface file "cgroup.subtree_populated" which
> can be used to monitor whether the cgroup's subhierarchy has tasks in
> it or not.  Its value is 1 if there is no task in the cgroup and its

I think you meant this backward?  It's 1 if there is *any task in
the cgroup and its descendants, else 0?

> descendants; otherwise, 0, and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
> 
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Serge Hallyn <serge.hallyn@ubuntu.com>

Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>

> Cc: Lennart Poettering <lennart@poettering.net>
> ---
>  include/linux/cgroup.h | 15 ++++++++++++
>  kernel/cgroup.c        | 65 ++++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index dee6f3c..e45d87f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -154,6 +154,14 @@ struct cgroup {
>  	/* the number of attached css's */
>  	int nr_css;
>  
> +	/*
> +	 * If this cgroup contains any tasks, it contributes one to
> +	 * populated_cnt.  All children with non-zero popuplated_cnt of
> +	 * their own contribute one.  The count is zero iff there's no task
> +	 * in this cgroup or its subtree.
> +	 */
> +	int populated_cnt;
> +
>  	atomic_t refcnt;
>  
>  	/*
> @@ -166,6 +174,7 @@ struct cgroup {
>  	struct cgroup *parent;		/* my parent */
>  	struct kernfs_node *kn;		/* cgroup kernfs entry */
>  	struct kernfs_node *control_kn;	/* kn for "cgroup.subtree_control" */
> +	struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
>  
>  	/*
>  	 * Monotonically increasing unique serial number which defines a
> @@ -264,6 +273,12 @@ enum {
>  	 *
>  	 * - "cgroup.clone_children" is removed.
>  	 *
> +	 * - "cgroup.subtree_populated" is available.  Its value is 0 if
> +	 *   the cgroup and its descendants contain no task; otherwise, 1.
> +	 *   The file also generates kernfs notification which can be
> +	 *   monitored through poll and [di]notify when the value of the
> +	 *   file changes.
> +	 *
>  	 * - If mount is requested with sane_behavior but without any
>  	 *   subsystem, the default unified hierarchy is mounted.
>  	 *
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 4e958c7..17f0a09 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -411,6 +411,43 @@ static struct css_set init_css_set = {
>  
>  static int css_set_count	= 1;	/* 1 for init_css_set */
>  
> +/**
> + * cgroup_update_populated - updated populated count of a cgroup
> + * @cgrp: the target cgroup
> + * @populated: inc or dec populated count
> + *
> + * @cgrp is either getting the first task (css_set) or losing the last.
> + * Update @cgrp->populated_cnt accordingly.  The count is propagated
> + * towards root so that a given cgroup's populated_cnt is zero iff the
> + * cgroup and all its descendants are empty.
> + *
> + * @cgrp's interface file "cgroup.subtree_populated" is zero if
> + * @cgrp->populated_cnt is zero and 1 otherwise.  When @cgrp->populated_cnt
> + * changes from or to zero, userland is notified that the content of the
> + * interface file has changed.  This can be used to detect when @cgrp and
> + * its descendants become populated or empty.
> + */
> +static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
> +{
> +	lockdep_assert_held(&css_set_rwsem);
> +
> +	do {
> +		bool trigger;
> +
> +		if (populated)
> +			trigger = !cgrp->populated_cnt++;
> +		else
> +			trigger = !--cgrp->populated_cnt;
> +
> +		if (!trigger)
> +			break;
> +
> +		if (cgrp->populated_kn)
> +			kernfs_notify(cgrp->populated_kn);
> +		cgrp = cgrp->parent;
> +	} while (cgrp);
> +}
> +
>  /*
>   * hash table for cgroup groups. This improves the performance to find
>   * an existing css_set. This hash doesn't (currently) take into
> @@ -456,10 +493,13 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit)
>  		list_del(&link->cgrp_link);
>  
>  		/* @cgrp can't go away while we're holding css_set_rwsem */
> -		if (list_empty(&cgrp->cset_links) && notify_on_release(cgrp)) {
> -			if (taskexit)
> -				set_bit(CGRP_RELEASABLE, &cgrp->flags);
> -			check_for_release(cgrp);
> +		if (list_empty(&cgrp->cset_links)) {
> +			cgroup_update_populated(cgrp, false);
> +			if (notify_on_release(cgrp)) {
> +				if (taskexit)
> +					set_bit(CGRP_RELEASABLE, &cgrp->flags);
> +				check_for_release(cgrp);
> +			}
>  		}
>  
>  		kfree(link);
> @@ -668,7 +708,11 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
>  	link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link);
>  	link->cset = cset;
>  	link->cgrp = cgrp;
> +
> +	if (list_empty(&cgrp->cset_links))
> +		cgroup_update_populated(cgrp, true);
>  	list_move(&link->cset_link, &cgrp->cset_links);
> +
>  	/*
>  	 * Always add links to the tail of the list so that the list
>  	 * is sorted by order of hierarchy creation
> @@ -2633,6 +2677,12 @@ err_undo_css:
>  	goto out_unlock;
>  }
>  
> +static int cgroup_subtree_populated_show(struct seq_file *seq, void *v)
> +{
> +	seq_printf(seq, "%d\n", (bool)seq_css(seq)->cgroup->populated_cnt);
> +	return 0;
> +}
> +
>  static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
>  				 size_t nbytes, loff_t off)
>  {
> @@ -2775,6 +2825,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
>  				  NULL, false, key);
>  	if (cft->seq_show == cgroup_subtree_control_show)
>  		cgrp->control_kn = kn;
> +	else if (cft->seq_show == cgroup_subtree_populated_show)
> +		cgrp->populated_kn = kn;
>  	return PTR_ERR_OR_ZERO(kn);
>  }
>  
> @@ -3883,6 +3935,11 @@ static struct cftype cgroup_base_files[] = {
>  		.seq_show = cgroup_subtree_control_show,
>  		.write_string = cgroup_subtree_control_write,
>  	},
> +	{
> +		.name = "cgroup.subtree_populated",
> +		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
> +		.seq_show = cgroup_subtree_populated_show,
> +	},
>  
>  	/*
>  	 * Historical crazy stuff.  These don't have "cgroup."  prefix and
> -- 
> 1.9.0
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

  reply	other threads:[~2014-04-10  3:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-09 15:07 [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated Tejun Heo
     [not found] ` <1397056052-2829-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-09 15:07   ` [PATCH 1/3] kernfs: implement kernfs_root->supers list Tejun Heo
2014-04-09 15:07   ` [PATCH 2/3] kernfs: make kernfs_notify() trigger inotify events too Tejun Heo
2014-04-09 15:07   ` [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy Tejun Heo
2014-04-10  3:08     ` Serge E. Hallyn [this message]
     [not found]       ` <20140410030855.GA29658-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2014-04-10 13:08         ` Tejun Heo
     [not found]           ` <20140410130831.GA25308-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-10 14:04             ` Serge Hallyn
2014-04-10 14:19               ` Tejun Heo
     [not found]                 ` <20140410141957.GE25308-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-11  9:00                   ` Li Zefan
2014-04-14 21:31   ` [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated Tejun Heo
     [not found]     ` <20140414213100.GA1863-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-14 22:26       ` Greg KH
     [not found]         ` <20140414222658.GA18152-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-04-15 16:18           ` Tejun Heo
     [not found]             ` <20140415161828.GA30990-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-23 15:16               ` Tejun Heo
     [not found]                 ` <20140423151638.GG4781-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-25 18:57                   ` Greg KH
2014-04-25 22:30   ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2014-04-14 21:44 [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated, v2 Tejun Heo
     [not found] ` <1397511846-2904-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-14 21:44   ` [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy Tejun Heo
     [not found]     ` <1397511846-2904-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-15  0:57       ` Li Zefan
     [not found]         ` <534C83F1.9020106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-04-15 14:54           ` Tejun Heo
     [not found]             ` <20140415145450.GL1863-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-15 16:52               ` Tejun Heo
     [not found]                 ` <20140415165221.GD30990-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-16  1:30                   ` Li Zefan
2014-04-16  2:48       ` Li Zefan
     [not found]         ` <534DEF62.4090900-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-04-16  3:33           ` Kay Sievers
     [not found]             ` <CAPXgP12kvPdX0QExwN2JphDfEW=d+7K2c_Y8DbomGd=YVy=VGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-16  3:50               ` Eric W. Biederman
     [not found]                 ` <87tx9uhr0j.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-04-16  4:15                   ` Kay Sievers
2014-04-16  4:20                   ` Li Zefan
2014-04-16  4:16               ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140410030855.GA29658@mail.hallyn.com \
    --to=serge@hallyn.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=eparis@parisplace.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=john@johnmccutchan.com \
    --cc=kay@vrfy.org \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rlove@rlove.org \
    --cc=serge.hallyn@ubuntu.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).