cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: lizefan@huawei.com, hannes@cmpxchg.org, mhocko@suse.cz,
	bsingharora@gmail.com, kamezawa.hiroyu@jp.fujitsu.com
Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Glauber Costa <glommer@parallels.com>
Subject: [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy()
Date: Wed, 31 Oct 2012 12:44:06 -0700	[thread overview]
Message-ID: <1351712650-23709-5-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1351712650-23709-1-git-send-email-tj@kernel.org>

Because ->pre_destroy() could fail and can't be called under
cgroup_mutex, cgroup destruction did something very ugly.

  1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.

  2. Release cgroup_mutex and call ->pre_destroy().

  3. Re-grab cgroup_mutex and verify it can still be destroyed; fail
     otherwise.

  4. Continue destroying.

In addition to being ugly, it has been always broken in various ways.
For example, memcg ->pre_destroy() expects the cgroup to be inactive
after it's done but tasks can be attached and detached between #2 and
#3 and the conditions that memcg verified in ->pre_destroy() might no
longer hold by the time control reaches #3.

Now that ->pre_destroy() is no longer allowed to fail.  We can switch
to the following.

  1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.

  2. Deactivate CSS's and mark the cgroup removed thus preventing any
     further operations which can invalidate the verification from #1.

  3. Release cgroup_mutex and call ->pre_destroy().

  4. Re-grab cgroup_mutex and continue destroying.

After this change, controllers can safely assume that ->pre_destroy()
will only be called only once for a given cgroup and, once
->pre_destroy() is called, the cgroup will stay dormant till it's
destroyed.

This removes the only reason ->pre_destroy() can fail - new task being
attached or child cgroup being created inbetween.  Error out path is
removed and ->pre_destroy() invocation is open coded in
cgroup_rmdir().

v2: cgroup_call_pre_destroy() removal moved to this patch per Michal.
    Commit message updated per Glauber.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@parallels.com>
---
 kernel/cgroup.c | 65 +++++++++++++++++----------------------------------------
 1 file changed, 19 insertions(+), 46 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f22e3cd..66204a6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -851,27 +851,6 @@ static struct inode *cgroup_new_inode(umode_t mode, struct super_block *sb)
 	return inode;
 }
 
-/*
- * Call subsys's pre_destroy handler.
- * This is called before css refcnt check.
- */
-static int cgroup_call_pre_destroy(struct cgroup *cgrp)
-{
-	struct cgroup_subsys *ss;
-	int ret = 0;
-
-	for_each_subsys(cgrp->root, ss) {
-		if (!ss->pre_destroy)
-			continue;
-
-		ret = ss->pre_destroy(cgrp);
-		if (WARN_ON_ONCE(ret))
-			break;
-	}
-
-	return ret;
-}
-
 static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 {
 	/* is dentry a directory ? if so, kfree() associated cgroup */
@@ -4078,19 +4057,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
 	DEFINE_WAIT(wait);
 	struct cgroup_event *event, *tmp;
 	struct cgroup_subsys *ss;
-	int ret;
-
-	/* the vfs holds both inode->i_mutex already */
-	mutex_lock(&cgroup_mutex);
-	if (atomic_read(&cgrp->count) != 0) {
-		mutex_unlock(&cgroup_mutex);
-		return -EBUSY;
-	}
-	if (!list_empty(&cgrp->children)) {
-		mutex_unlock(&cgroup_mutex);
-		return -EBUSY;
-	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4103,16 +4069,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
 	 */
 	set_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
 
-	/*
-	 * Call pre_destroy handlers of subsys. Notify subsystems
-	 * that rmdir() request comes.
-	 */
-	ret = cgroup_call_pre_destroy(cgrp);
-	if (ret) {
-		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
-		return ret;
-	}
-
+	/* the vfs holds both inode->i_mutex already */
 	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
@@ -4122,13 +4079,30 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
 	}
 	prepare_to_wait(&cgroup_rmdir_waitq, &wait, TASK_INTERRUPTIBLE);
 
-	/* block new css_tryget() by deactivating refcnt */
+	/*
+	 * Block new css_tryget() by deactivating refcnt and mark @cgrp
+	 * removed.  This makes future css_tryget() and child creation
+	 * attempts fail thus maintaining the removal conditions verified
+	 * above.
+	 */
 	for_each_subsys(cgrp->root, ss) {
 		struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
 
 		WARN_ON(atomic_read(&css->refcnt) < 0);
 		atomic_add(CSS_DEACT_BIAS, &css->refcnt);
 	}
+	set_bit(CGRP_REMOVED, &cgrp->flags);
+
+	/*
+	 * Tell subsystems to initate destruction.  pre_destroy() should be
+	 * called with cgroup_mutex unlocked.  See 3fa59dfbc3 ("cgroup: fix
+	 * potential deadlock in pre_destroy") for details.
+	 */
+	mutex_unlock(&cgroup_mutex);
+	for_each_subsys(cgrp->root, ss)
+		if (ss->pre_destroy)
+			WARN_ON_ONCE(ss->pre_destroy(cgrp));
+	mutex_lock(&cgroup_mutex);
 
 	/*
 	 * Put all the base refs.  Each css holds an extra reference to the
@@ -4144,7 +4118,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
 	clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
 
 	raw_spin_lock(&release_list_lock);
-	set_bit(CGRP_REMOVED, &cgrp->flags);
 	if (!list_empty(&cgrp->release_list))
 		list_del_init(&cgrp->release_list);
 	raw_spin_unlock(&release_list_lock);
-- 
1.7.11.7

  parent reply	other threads:[~2012-10-31 19:44 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-31 19:44 [PATCHSET RESEND v2] cgroup: simplify cgroup removal path Tejun Heo
     [not found] ` <1351712650-23709-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 19:44   ` [PATCH 1/8] cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs Tejun Heo
     [not found]     ` <1351712650-23709-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 20:08       ` Michal Hocko
     [not found]         ` <20121031200859.GE1271-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 20:11           ` Tejun Heo
     [not found]             ` <CAOS58YPbYoMJ1+3uRfK_ZERyZoaby=FPW7uTpp8dVOSgYC8Mrw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-31 20:14               ` Michal Hocko
     [not found]                 ` <20121031201415.GG1271-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 20:24                   ` Tejun Heo
     [not found]                     ` <20121031202400.GT2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 20:49                       ` Michal Hocko
2012-11-02 10:01                       ` Kamezawa Hiroyuki
2012-10-31 20:12       ` Michal Hocko
     [not found]         ` <20121031201227.GF1271-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 20:14           ` Tejun Heo
     [not found]             ` <CAOS58YOHjLyKFeah+h+qOrAWvfi1O5eL7m-AMbqAdcP=EOFb6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-31 21:23               ` Michal Hocko
2012-11-05  5:34       ` Li Zefan
2012-10-31 19:44   ` [PATCH 2/8] cgroup: kill CSS_REMOVED Tejun Heo
     [not found]     ` <1351712650-23709-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:02       ` Kamezawa Hiroyuki
2012-11-05  5:33       ` Li Zefan
2012-10-31 19:44   ` [PATCH 3/8] cgroup: use cgroup_lock_live_group(parent) in cgroup_create() Tejun Heo
     [not found]     ` <1351712650-23709-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:03       ` Kamezawa Hiroyuki
2012-11-05  5:36       ` Li Zefan
2012-10-31 19:44   ` [PATCH 5/8] cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() Tejun Heo
     [not found]     ` <1351712650-23709-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:20       ` Kamezawa Hiroyuki
2012-11-05  5:40       ` Li Zefan
2012-10-31 19:44   ` [PATCH 6/8] memcg: make mem_cgroup_reparent_charges non failing Tejun Heo
     [not found]     ` <1351712650-23709-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:21       ` Kamezawa Hiroyuki
2012-10-31 19:44   ` [PATCH 7/8] hugetlb: do not fail in hugetlb_cgroup_pre_destroy Tejun Heo
     [not found]     ` <1351712650-23709-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:23       ` Kamezawa Hiroyuki
2012-11-05 17:30   ` [PATCHSET RESEND v2] cgroup: simplify cgroup removal path Tejun Heo
     [not found]     ` <20121105173024.GA19354-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2012-11-05 18:39       ` Michal Hocko
2012-10-31 19:44 ` Tejun Heo [this message]
     [not found]   ` <1351712650-23709-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 21:23     ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Michal Hocko
     [not found]       ` <20121031212359.GC5286-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 21:27         ` Tejun Heo
     [not found]           ` <20121031212725.GA2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-11-01  8:58             ` Michal Hocko
2012-11-02 10:05     ` Kamezawa Hiroyuki
2012-11-05  5:37     ` Li Zefan
2012-10-31 19:44 ` [PATCH 8/8] cgroup: make ->pre_destroy() return void Tejun Heo
     [not found]   ` <1351712650-23709-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 21:23     ` Michal Hocko
2012-11-02 10:24     ` Kamezawa Hiroyuki
2012-11-05  5:41     ` Li Zefan
  -- strict thread matches above, loose matches on Subject: below --
2012-10-31 18:16 [PATCHSET v2] cgroup: simplify cgroup removal path Tejun Heo
     [not found] ` <1351707391-22287-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 18:16   ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Tejun Heo
2012-10-31  4:22 [PATCHSET] cgroup: simplify cgroup removal path Tejun Heo
     [not found] ` <1351657365-25055-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31  4:22   ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Tejun Heo
     [not found]     ` <1351657365-25055-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 13:42       ` Glauber Costa
2012-10-31 16:05       ` Michal Hocko
2012-11-02  9:43       ` Kamezawa Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1351712650-23709-5-git-send-email-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).