All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge@hallyn.com>
To: Tejun Heo <tj@kernel.org>
Cc: gregkh@linuxfoundation.org, rlove@rlove.org,
	containers@lists.linux-foundation.org, serge.hallyn@ubuntu.com,
	kay@vrfy.org, linux-kernel@vger.kernel.org,
	lennart@poettering.net, cgroups@vger.kernel.org,
	eparis@parisplace.org, john@johnmccutchan.com
Subject: Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy
Date: Thu, 10 Apr 2014 05:08:55 +0200	[thread overview]
Message-ID: <20140410030855.GA29658@mail.hallyn.com> (raw)
In-Reply-To: <1397056052-2829-4-git-send-email-tj@kernel.org>

Quoting Tejun Heo (tj@kernel.org):
> cgroup users often need a way to determine when a cgroup's
> subhierarchy becomes empty so that it can be cleaned up.  cgroup
> currently provides release_agent for it; unfortunately, this mechanism
> is riddled with issues.

Thanks, Tejun.

> * It delivers events by forking and execing a userland binary
>   specified as the release_agent.  This is a long deprecated method of
>   notification delivery.  It's extremely heavy, slow and cumbersome to
>   integrate with larger infrastructure.

(Not seriously worried about this, but it's a point worth considering)
It does have one advantage though:  if the userspace agent goes bad,
cgroups can still be removed on empty.

Do you plan on keeping release-on-empty around?  I assume only for a
while?

Do you think there is any value in having a simpler "remove-when-empty"
file?  Doesn't call out to userspace, just drops the cgroup when there
are no more tasks or sub-cgroups?

> * There is single monitoring point at the root.  There's no way to
>   delegate management of subtree.
> 
> * The event isn't recursive.  It triggers when a cgroup doesn't have
>   any tasks or child cgroups.  Events for internal nodes trigger only
>   after all children are removed.  This again makes it impossible to
>   delegate management of subtree.
> 
> * Events are filtered from the kernel side.  "notify_on_release" file
>   is used to subscribe to or suppres release event and events are not
>   generated if a cgroup becomes empty by moving the last task out of
>   it; however, event is generated if it becomes empty because the last
>   child cgroup is removed.  This is inconsistent, awkward and

Hm, maybe I'm misreading but this doesn't seem right.  If I move
a task into x1 and kill the task, x1 goes away.  Likewise if I
create x1/y1, and rmdir y1, x1 goes away.  I suspect I'm misunderstanding
the case in which you say it doesn't happen?

>   unnecessarily complicated and probably done this way because event
>   delivery itself was expensive.
> 
> This patch implements interface file "cgroup.subtree_populated" which
> can be used to monitor whether the cgroup's subhierarchy has tasks in
> it or not.  Its value is 1 if there is no task in the cgroup and its

I think you meant this backward?  It's 1 if there is *any task in
the cgroup and its descendants, else 0?

> descendants; otherwise, 0, and kernfs_notify() notificaiton is
> triggers when the value changes, which can be monitored through poll
> and [di]notify.
> 
> This is a lot ligther and simpler and trivially allows delegating
> management of subhierarchy - subhierarchy monitoring can block further
> propgation simply by putting itself or another process in the root of
> the subhierarchy and monitor events that it's interested in from there
> without interfering with monitoring higher in the tree.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Serge Hallyn <serge.hallyn@ubuntu.com>

Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>

> Cc: Lennart Poettering <lennart@poettering.net>
> ---
>  include/linux/cgroup.h | 15 ++++++++++++
>  kernel/cgroup.c        | 65 ++++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index dee6f3c..e45d87f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -154,6 +154,14 @@ struct cgroup {
>  	/* the number of attached css's */
>  	int nr_css;
>  
> +	/*
> +	 * If this cgroup contains any tasks, it contributes one to
> +	 * populated_cnt.  All children with non-zero popuplated_cnt of
> +	 * their own contribute one.  The count is zero iff there's no task
> +	 * in this cgroup or its subtree.
> +	 */
> +	int populated_cnt;
> +
>  	atomic_t refcnt;
>  
>  	/*
> @@ -166,6 +174,7 @@ struct cgroup {
>  	struct cgroup *parent;		/* my parent */
>  	struct kernfs_node *kn;		/* cgroup kernfs entry */
>  	struct kernfs_node *control_kn;	/* kn for "cgroup.subtree_control" */
> +	struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
>  
>  	/*
>  	 * Monotonically increasing unique serial number which defines a
> @@ -264,6 +273,12 @@ enum {
>  	 *
>  	 * - "cgroup.clone_children" is removed.
>  	 *
> +	 * - "cgroup.subtree_populated" is available.  Its value is 0 if
> +	 *   the cgroup and its descendants contain no task; otherwise, 1.
> +	 *   The file also generates kernfs notification which can be
> +	 *   monitored through poll and [di]notify when the value of the
> +	 *   file changes.
> +	 *
>  	 * - If mount is requested with sane_behavior but without any
>  	 *   subsystem, the default unified hierarchy is mounted.
>  	 *
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 4e958c7..17f0a09 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -411,6 +411,43 @@ static struct css_set init_css_set = {
>  
>  static int css_set_count	= 1;	/* 1 for init_css_set */
>  
> +/**
> + * cgroup_update_populated - updated populated count of a cgroup
> + * @cgrp: the target cgroup
> + * @populated: inc or dec populated count
> + *
> + * @cgrp is either getting the first task (css_set) or losing the last.
> + * Update @cgrp->populated_cnt accordingly.  The count is propagated
> + * towards root so that a given cgroup's populated_cnt is zero iff the
> + * cgroup and all its descendants are empty.
> + *
> + * @cgrp's interface file "cgroup.subtree_populated" is zero if
> + * @cgrp->populated_cnt is zero and 1 otherwise.  When @cgrp->populated_cnt
> + * changes from or to zero, userland is notified that the content of the
> + * interface file has changed.  This can be used to detect when @cgrp and
> + * its descendants become populated or empty.
> + */
> +static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
> +{
> +	lockdep_assert_held(&css_set_rwsem);
> +
> +	do {
> +		bool trigger;
> +
> +		if (populated)
> +			trigger = !cgrp->populated_cnt++;
> +		else
> +			trigger = !--cgrp->populated_cnt;
> +
> +		if (!trigger)
> +			break;
> +
> +		if (cgrp->populated_kn)
> +			kernfs_notify(cgrp->populated_kn);
> +		cgrp = cgrp->parent;
> +	} while (cgrp);
> +}
> +
>  /*
>   * hash table for cgroup groups. This improves the performance to find
>   * an existing css_set. This hash doesn't (currently) take into
> @@ -456,10 +493,13 @@ static void put_css_set_locked(struct css_set *cset, bool taskexit)
>  		list_del(&link->cgrp_link);
>  
>  		/* @cgrp can't go away while we're holding css_set_rwsem */
> -		if (list_empty(&cgrp->cset_links) && notify_on_release(cgrp)) {
> -			if (taskexit)
> -				set_bit(CGRP_RELEASABLE, &cgrp->flags);
> -			check_for_release(cgrp);
> +		if (list_empty(&cgrp->cset_links)) {
> +			cgroup_update_populated(cgrp, false);
> +			if (notify_on_release(cgrp)) {
> +				if (taskexit)
> +					set_bit(CGRP_RELEASABLE, &cgrp->flags);
> +				check_for_release(cgrp);
> +			}
>  		}
>  
>  		kfree(link);
> @@ -668,7 +708,11 @@ static void link_css_set(struct list_head *tmp_links, struct css_set *cset,
>  	link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link);
>  	link->cset = cset;
>  	link->cgrp = cgrp;
> +
> +	if (list_empty(&cgrp->cset_links))
> +		cgroup_update_populated(cgrp, true);
>  	list_move(&link->cset_link, &cgrp->cset_links);
> +
>  	/*
>  	 * Always add links to the tail of the list so that the list
>  	 * is sorted by order of hierarchy creation
> @@ -2633,6 +2677,12 @@ err_undo_css:
>  	goto out_unlock;
>  }
>  
> +static int cgroup_subtree_populated_show(struct seq_file *seq, void *v)
> +{
> +	seq_printf(seq, "%d\n", (bool)seq_css(seq)->cgroup->populated_cnt);
> +	return 0;
> +}
> +
>  static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf,
>  				 size_t nbytes, loff_t off)
>  {
> @@ -2775,6 +2825,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
>  				  NULL, false, key);
>  	if (cft->seq_show == cgroup_subtree_control_show)
>  		cgrp->control_kn = kn;
> +	else if (cft->seq_show == cgroup_subtree_populated_show)
> +		cgrp->populated_kn = kn;
>  	return PTR_ERR_OR_ZERO(kn);
>  }
>  
> @@ -3883,6 +3935,11 @@ static struct cftype cgroup_base_files[] = {
>  		.seq_show = cgroup_subtree_control_show,
>  		.write_string = cgroup_subtree_control_write,
>  	},
> +	{
> +		.name = "cgroup.subtree_populated",
> +		.flags = CFTYPE_ONLY_ON_DFL | CFTYPE_NOT_ON_ROOT,
> +		.seq_show = cgroup_subtree_populated_show,
> +	},
>  
>  	/*
>  	 * Historical crazy stuff.  These don't have "cgroup."  prefix and
> -- 
> 1.9.0
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

  reply	other threads:[~2014-04-10  3:08 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-09 15:07 [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated Tejun Heo
2014-04-09 15:07 ` Tejun Heo
     [not found] ` <1397056052-2829-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-09 15:07   ` [PATCH 1/3] kernfs: implement kernfs_root->supers list Tejun Heo
2014-04-09 15:07     ` Tejun Heo
2014-04-09 15:07   ` [PATCH 2/3] kernfs: make kernfs_notify() trigger inotify events too Tejun Heo
2014-04-09 15:07     ` Tejun Heo
2014-04-09 15:07   ` [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy Tejun Heo
2014-04-09 15:07     ` Tejun Heo
2014-04-10  3:08     ` Serge E. Hallyn [this message]
     [not found]       ` <20140410030855.GA29658-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2014-04-10 13:08         ` Tejun Heo
2014-04-10 13:08           ` Tejun Heo
     [not found]           ` <20140410130831.GA25308-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-10 14:04             ` Serge Hallyn
2014-04-10 14:04               ` Serge Hallyn
2014-04-10 14:19               ` Tejun Heo
2014-04-10 14:19                 ` Tejun Heo
     [not found]                 ` <20140410141957.GE25308-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-11  9:00                   ` Li Zefan
2014-04-11  9:00                     ` Li Zefan
     [not found]     ` <1397056052-2829-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-10  3:08       ` Serge E. Hallyn
2014-04-14 21:31   ` [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated Tejun Heo
2014-04-14 21:31     ` Tejun Heo
     [not found]     ` <20140414213100.GA1863-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-14 22:26       ` Greg KH
2014-04-14 22:26         ` Greg KH
     [not found]         ` <20140414222658.GA18152-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2014-04-15 16:18           ` Tejun Heo
2014-04-15 16:18             ` Tejun Heo
     [not found]             ` <20140415161828.GA30990-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-23 15:16               ` Tejun Heo
2014-04-23 15:16                 ` Tejun Heo
     [not found]                 ` <20140423151638.GG4781-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-25 18:57                   ` Greg KH
2014-04-25 18:57                   ` Greg KH
2014-04-25 18:57                     ` Greg KH
2014-04-25 22:30   ` Tejun Heo
2014-04-25 22:30     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2014-04-14 21:44 [PATCHSET cgroup/for-3.16] cgroup: implement cgroup.populated, v2 Tejun Heo
     [not found] ` <1397511846-2904-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-14 21:44   ` [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy Tejun Heo
2014-04-14 21:44     ` Tejun Heo
     [not found]     ` <1397511846-2904-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2014-04-15  0:57       ` Li Zefan
2014-04-15  0:57         ` Li Zefan
     [not found]         ` <534C83F1.9020106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-04-15 14:54           ` Tejun Heo
2014-04-15 14:54           ` Tejun Heo
2014-04-15 14:54             ` Tejun Heo
     [not found]             ` <20140415145450.GL1863-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-15 16:52               ` Tejun Heo
2014-04-15 16:52                 ` Tejun Heo
     [not found]                 ` <20140415165221.GD30990-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-04-16  1:30                   ` Li Zefan
2014-04-16  1:30                   ` Li Zefan
2014-04-16  1:30                     ` Li Zefan
2014-04-16  2:48       ` Li Zefan
2014-04-16  2:48         ` Li Zefan
     [not found]         ` <534DEF62.4090900-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-04-16  3:33           ` Kay Sievers
2014-04-16  3:33             ` Kay Sievers
     [not found]             ` <CAPXgP12kvPdX0QExwN2JphDfEW=d+7K2c_Y8DbomGd=YVy=VGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-16  3:50               ` Eric W. Biederman
2014-04-16  3:50                 ` Eric W. Biederman
     [not found]                 ` <87tx9uhr0j.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-04-16  4:15                   ` Kay Sievers
2014-04-16  4:15                     ` Kay Sievers
2014-04-16  4:20                   ` Li Zefan
2014-04-16  4:20                     ` Li Zefan
2014-04-16  3:50               ` Eric W. Biederman
2014-04-16  4:16               ` Li Zefan
2014-04-16  4:16                 ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140410030855.GA29658@mail.hallyn.com \
    --to=serge@hallyn.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=eparis@parisplace.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=john@johnmccutchan.com \
    --cc=kay@vrfy.org \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rlove@rlove.org \
    --cc=serge.hallyn@ubuntu.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.