From: Tejun Heo <tj@kernel.org>
To: lizefan@huawei.com, hannes@cmpxchg.org, mhocko@suse.cz,
bsingharora@gmail.com, kamezawa.hiroyu@jp.fujitsu.com
Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Glauber Costa <glommer@parallels.com>
Subject: [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy()
Date: Wed, 31 Oct 2012 12:44:06 -0700 [thread overview]
Message-ID: <1351712650-23709-5-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1351712650-23709-1-git-send-email-tj@kernel.org>
Because ->pre_destroy() could fail and can't be called under
cgroup_mutex, cgroup destruction did something very ugly.
1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.
2. Release cgroup_mutex and call ->pre_destroy().
3. Re-grab cgroup_mutex and verify it can still be destroyed; fail
otherwise.
4. Continue destroying.
In addition to being ugly, it has been always broken in various ways.
For example, memcg ->pre_destroy() expects the cgroup to be inactive
after it's done but tasks can be attached and detached between #2 and
#3 and the conditions that memcg verified in ->pre_destroy() might no
longer hold by the time control reaches #3.
Now that ->pre_destroy() is no longer allowed to fail. We can switch
to the following.
1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.
2. Deactivate CSS's and mark the cgroup removed thus preventing any
further operations which can invalidate the verification from #1.
3. Release cgroup_mutex and call ->pre_destroy().
4. Re-grab cgroup_mutex and continue destroying.
After this change, controllers can safely assume that ->pre_destroy()
will only be called only once for a given cgroup and, once
->pre_destroy() is called, the cgroup will stay dormant till it's
destroyed.
This removes the only reason ->pre_destroy() can fail - new task being
attached or child cgroup being created inbetween. Error out path is
removed and ->pre_destroy() invocation is open coded in
cgroup_rmdir().
v2: cgroup_call_pre_destroy() removal moved to this patch per Michal.
Commit message updated per Glauber.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@parallels.com>
---
kernel/cgroup.c | 65 +++++++++++++++++----------------------------------------
1 file changed, 19 insertions(+), 46 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f22e3cd..66204a6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -851,27 +851,6 @@ static struct inode *cgroup_new_inode(umode_t mode, struct super_block *sb)
return inode;
}
-/*
- * Call subsys's pre_destroy handler.
- * This is called before css refcnt check.
- */
-static int cgroup_call_pre_destroy(struct cgroup *cgrp)
-{
- struct cgroup_subsys *ss;
- int ret = 0;
-
- for_each_subsys(cgrp->root, ss) {
- if (!ss->pre_destroy)
- continue;
-
- ret = ss->pre_destroy(cgrp);
- if (WARN_ON_ONCE(ret))
- break;
- }
-
- return ret;
-}
-
static void cgroup_diput(struct dentry *dentry, struct inode *inode)
{
/* is dentry a directory ? if so, kfree() associated cgroup */
@@ -4078,19 +4057,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
DEFINE_WAIT(wait);
struct cgroup_event *event, *tmp;
struct cgroup_subsys *ss;
- int ret;
-
- /* the vfs holds both inode->i_mutex already */
- mutex_lock(&cgroup_mutex);
- if (atomic_read(&cgrp->count) != 0) {
- mutex_unlock(&cgroup_mutex);
- return -EBUSY;
- }
- if (!list_empty(&cgrp->children)) {
- mutex_unlock(&cgroup_mutex);
- return -EBUSY;
- }
- mutex_unlock(&cgroup_mutex);
/*
* In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4103,16 +4069,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
*/
set_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
- /*
- * Call pre_destroy handlers of subsys. Notify subsystems
- * that rmdir() request comes.
- */
- ret = cgroup_call_pre_destroy(cgrp);
- if (ret) {
- clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
- return ret;
- }
-
+ /* the vfs holds both inode->i_mutex already */
mutex_lock(&cgroup_mutex);
parent = cgrp->parent;
if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
@@ -4122,13 +4079,30 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
}
prepare_to_wait(&cgroup_rmdir_waitq, &wait, TASK_INTERRUPTIBLE);
- /* block new css_tryget() by deactivating refcnt */
+ /*
+ * Block new css_tryget() by deactivating refcnt and mark @cgrp
+ * removed. This makes future css_tryget() and child creation
+ * attempts fail thus maintaining the removal conditions verified
+ * above.
+ */
for_each_subsys(cgrp->root, ss) {
struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
WARN_ON(atomic_read(&css->refcnt) < 0);
atomic_add(CSS_DEACT_BIAS, &css->refcnt);
}
+ set_bit(CGRP_REMOVED, &cgrp->flags);
+
+ /*
+ * Tell subsystems to initate destruction. pre_destroy() should be
+ * called with cgroup_mutex unlocked. See 3fa59dfbc3 ("cgroup: fix
+ * potential deadlock in pre_destroy") for details.
+ */
+ mutex_unlock(&cgroup_mutex);
+ for_each_subsys(cgrp->root, ss)
+ if (ss->pre_destroy)
+ WARN_ON_ONCE(ss->pre_destroy(cgrp));
+ mutex_lock(&cgroup_mutex);
/*
* Put all the base refs. Each css holds an extra reference to the
@@ -4144,7 +4118,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
raw_spin_lock(&release_list_lock);
- set_bit(CGRP_REMOVED, &cgrp->flags);
if (!list_empty(&cgrp->release_list))
list_del_init(&cgrp->release_list);
raw_spin_unlock(&release_list_lock);
--
1.7.11.7
next prev parent reply other threads:[~2012-10-31 19:44 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-31 19:44 [PATCHSET RESEND v2] cgroup: simplify cgroup removal path Tejun Heo
2012-10-31 19:44 ` Tejun Heo
2012-10-31 19:44 ` Tejun Heo [this message]
[not found] ` <1351712650-23709-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 21:23 ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Michal Hocko
2012-10-31 21:23 ` Michal Hocko
[not found] ` <20121031212359.GC5286-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 21:27 ` Tejun Heo
2012-10-31 21:27 ` Tejun Heo
[not found] ` <20121031212725.GA2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-11-01 8:58 ` Michal Hocko
2012-11-01 8:58 ` Michal Hocko
2012-11-02 10:05 ` Kamezawa Hiroyuki
2012-11-02 10:05 ` Kamezawa Hiroyuki
2012-11-05 5:37 ` Li Zefan
2012-11-05 5:37 ` Li Zefan
[not found] ` <1351712650-23709-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 19:44 ` [PATCH 1/8] cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs Tejun Heo
2012-10-31 19:44 ` Tejun Heo
[not found] ` <1351712650-23709-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 20:08 ` Michal Hocko
2012-10-31 20:08 ` Michal Hocko
[not found] ` <20121031200859.GE1271-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 20:11 ` Tejun Heo
2012-10-31 20:11 ` Tejun Heo
[not found] ` <CAOS58YPbYoMJ1+3uRfK_ZERyZoaby=FPW7uTpp8dVOSgYC8Mrw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-31 20:14 ` Michal Hocko
2012-10-31 20:14 ` Michal Hocko
[not found] ` <20121031201415.GG1271-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 20:24 ` Tejun Heo
2012-10-31 20:24 ` Tejun Heo
[not found] ` <20121031202400.GT2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 20:49 ` Michal Hocko
2012-10-31 20:49 ` Michal Hocko
2012-10-31 20:49 ` Michal Hocko
2012-11-02 10:01 ` Kamezawa Hiroyuki
2012-11-02 10:01 ` Kamezawa Hiroyuki
2012-10-31 20:12 ` Michal Hocko
2012-10-31 20:12 ` Michal Hocko
[not found] ` <20121031201227.GF1271-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 20:14 ` Tejun Heo
2012-10-31 20:14 ` Tejun Heo
2012-10-31 20:14 ` Tejun Heo
[not found] ` <CAOS58YOHjLyKFeah+h+qOrAWvfi1O5eL7m-AMbqAdcP=EOFb6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-31 21:23 ` Michal Hocko
2012-10-31 21:23 ` Michal Hocko
2012-11-05 5:34 ` Li Zefan
2012-11-05 5:34 ` Li Zefan
2012-11-05 5:34 ` Li Zefan
2012-10-31 19:44 ` [PATCH 2/8] cgroup: kill CSS_REMOVED Tejun Heo
2012-10-31 19:44 ` Tejun Heo
2012-10-31 19:44 ` Tejun Heo
[not found] ` <1351712650-23709-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:02 ` Kamezawa Hiroyuki
2012-11-02 10:02 ` Kamezawa Hiroyuki
2012-11-02 10:02 ` Kamezawa Hiroyuki
2012-11-05 5:33 ` Li Zefan
2012-11-05 5:33 ` Li Zefan
2012-11-05 5:33 ` Li Zefan
2012-10-31 19:44 ` [PATCH 3/8] cgroup: use cgroup_lock_live_group(parent) in cgroup_create() Tejun Heo
2012-10-31 19:44 ` Tejun Heo
[not found] ` <1351712650-23709-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:03 ` Kamezawa Hiroyuki
2012-11-02 10:03 ` Kamezawa Hiroyuki
2012-11-05 5:36 ` Li Zefan
2012-11-05 5:36 ` Li Zefan
2012-11-05 5:36 ` Li Zefan
2012-10-31 19:44 ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Tejun Heo
2012-10-31 19:44 ` [PATCH 5/8] cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() Tejun Heo
2012-10-31 19:44 ` Tejun Heo
[not found] ` <1351712650-23709-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:20 ` Kamezawa Hiroyuki
2012-11-02 10:20 ` Kamezawa Hiroyuki
2012-11-02 10:20 ` Kamezawa Hiroyuki
2012-11-05 5:40 ` Li Zefan
2012-11-05 5:40 ` Li Zefan
2012-11-05 5:40 ` Li Zefan
2012-10-31 19:44 ` [PATCH 6/8] memcg: make mem_cgroup_reparent_charges non failing Tejun Heo
2012-10-31 19:44 ` Tejun Heo
[not found] ` <1351712650-23709-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:21 ` Kamezawa Hiroyuki
2012-11-02 10:21 ` Kamezawa Hiroyuki
2012-10-31 19:44 ` [PATCH 7/8] hugetlb: do not fail in hugetlb_cgroup_pre_destroy Tejun Heo
2012-10-31 19:44 ` Tejun Heo
[not found] ` <1351712650-23709-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:23 ` Kamezawa Hiroyuki
2012-11-02 10:23 ` Kamezawa Hiroyuki
2012-11-02 10:23 ` Kamezawa Hiroyuki
2012-10-31 19:44 ` [PATCH 8/8] cgroup: make ->pre_destroy() return void Tejun Heo
2012-11-05 17:30 ` [PATCHSET RESEND v2] cgroup: simplify cgroup removal path Tejun Heo
2012-11-05 17:30 ` Tejun Heo
2012-11-05 17:30 ` Tejun Heo
[not found] ` <20121105173024.GA19354-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2012-11-05 18:39 ` Michal Hocko
2012-11-05 18:39 ` Michal Hocko
2012-10-31 19:44 ` [PATCH 8/8] cgroup: make ->pre_destroy() return void Tejun Heo
[not found] ` <1351712650-23709-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 21:23 ` Michal Hocko
2012-10-31 21:23 ` Michal Hocko
2012-11-02 10:24 ` Kamezawa Hiroyuki
2012-11-02 10:24 ` Kamezawa Hiroyuki
2012-11-02 10:24 ` Kamezawa Hiroyuki
2012-11-05 5:41 ` Li Zefan
2012-11-05 5:41 ` Li Zefan
-- strict thread matches above, loose matches on Subject: below --
2012-10-31 18:16 [PATCHSET v2] cgroup: simplify cgroup removal path Tejun Heo
[not found] ` <1351707391-22287-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 18:16 ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Tejun Heo
2012-10-31 18:16 ` Tejun Heo
2012-10-31 4:22 [PATCHSET] cgroup: simplify cgroup removal path Tejun Heo
[not found] ` <1351657365-25055-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 4:22 ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Tejun Heo
2012-10-31 4:22 ` Tejun Heo
[not found] ` <1351657365-25055-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 13:42 ` Glauber Costa
2012-10-31 13:42 ` Glauber Costa
2012-10-31 16:05 ` Michal Hocko
2012-10-31 16:05 ` Michal Hocko
2012-11-02 9:43 ` Kamezawa Hiroyuki
2012-11-02 9:43 ` Kamezawa Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1351712650-23709-5-git-send-email-tj@kernel.org \
--to=tj@kernel.org \
--cc=bsingharora@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=glommer@parallels.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.