From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
mhocko-AlSwsSmVLrQ@public.gmane.org,
bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy()
Date: Tue, 30 Oct 2012 21:22:41 -0700 [thread overview]
Message-ID: <1351657365-25055-5-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1351657365-25055-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Because ->pre_destroy() could fail and can't be called under
cgroup_mutex, cgroup destruction did something very ugly.
1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.
2. Release cgroup_mutex and call ->pre_destroy().
3. Re-grab cgroup_mutex and verify it can still be destroyed; fail
otherwise.
4. Continue destroying.
In addition to being ugly, it has been always broken in various ways.
For example, memcg ->pre_destroy() expects the cgroup to be inactive
after it's done but tasks can be attached and detached between #2 and
#3 and the conditions that memcg verified in ->pre_destroy() might no
longer hold by the time control reaches #3.
Now that ->pre_destroy() is no longer allowed to fail. We can switch
to the following.
1. Grab cgroup_mutex and fail if it can't be destroyed; fail
otherwise.
2. Deactivate CSS's and mark the cgroup removed thus preventing any
further operations which can invalidate the verification from #1.
3. Release cgroup_mutex and call ->pre_destroy().
4. Re-grab cgroup_mutex and continue destroying.
After this change, controllers can safely assume that ->pre_destroy()
will only be called only once for a given cgroup and, once
->pre_destroy() is called, the cgroup will stay dormant till it's
destroyed.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup.c | 41 +++++++++++++++++++----------------------
1 file changed, 19 insertions(+), 22 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b3010ae..66204a6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4058,18 +4058,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
struct cgroup_event *event, *tmp;
struct cgroup_subsys *ss;
- /* the vfs holds both inode->i_mutex already */
- mutex_lock(&cgroup_mutex);
- if (atomic_read(&cgrp->count) != 0) {
- mutex_unlock(&cgroup_mutex);
- return -EBUSY;
- }
- if (!list_empty(&cgrp->children)) {
- mutex_unlock(&cgroup_mutex);
- return -EBUSY;
- }
- mutex_unlock(&cgroup_mutex);
-
/*
* In general, subsystem has no css->refcnt after pre_destroy(). But
* in racy cases, subsystem may have to get css->refcnt after
@@ -4081,14 +4069,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
*/
set_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
- /*
- * Call pre_destroy handlers of subsys. Notify subsystems
- * that rmdir() request comes.
- */
- for_each_subsys(cgrp->root, ss)
- if (ss->pre_destroy)
- WARN_ON_ONCE(ss->pre_destroy(cgrp));
-
+ /* the vfs holds both inode->i_mutex already */
mutex_lock(&cgroup_mutex);
parent = cgrp->parent;
if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
@@ -4098,13 +4079,30 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
}
prepare_to_wait(&cgroup_rmdir_waitq, &wait, TASK_INTERRUPTIBLE);
- /* block new css_tryget() by deactivating refcnt */
+ /*
+ * Block new css_tryget() by deactivating refcnt and mark @cgrp
+ * removed. This makes future css_tryget() and child creation
+ * attempts fail thus maintaining the removal conditions verified
+ * above.
+ */
for_each_subsys(cgrp->root, ss) {
struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
WARN_ON(atomic_read(&css->refcnt) < 0);
atomic_add(CSS_DEACT_BIAS, &css->refcnt);
}
+ set_bit(CGRP_REMOVED, &cgrp->flags);
+
+ /*
+ * Tell subsystems to initate destruction. pre_destroy() should be
+ * called with cgroup_mutex unlocked. See 3fa59dfbc3 ("cgroup: fix
+ * potential deadlock in pre_destroy") for details.
+ */
+ mutex_unlock(&cgroup_mutex);
+ for_each_subsys(cgrp->root, ss)
+ if (ss->pre_destroy)
+ WARN_ON_ONCE(ss->pre_destroy(cgrp));
+ mutex_lock(&cgroup_mutex);
/*
* Put all the base refs. Each css holds an extra reference to the
@@ -4120,7 +4118,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
raw_spin_lock(&release_list_lock);
- set_bit(CGRP_REMOVED, &cgrp->flags);
if (!list_empty(&cgrp->release_list))
list_del_init(&cgrp->release_list);
raw_spin_unlock(&release_list_lock);
--
1.7.11.7
next prev parent reply other threads:[~2012-10-31 4:22 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-31 4:22 [PATCHSET] cgroup: simplify cgroup removal path Tejun Heo
[not found] ` <1351657365-25055-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 4:22 ` [PATCH 1/8] cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs Tejun Heo
[not found] ` <1351657365-25055-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 13:21 ` Glauber Costa
[not found] ` <509125D9.8070100-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 16:38 ` Tejun Heo
2012-10-31 14:37 ` Michal Hocko
[not found] ` <20121031143751.GA22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 16:41 ` Tejun Heo
[not found] ` <20121031164123.GD2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 16:48 ` Michal Hocko
[not found] ` <20121031164855.GI22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 17:22 ` Tejun Heo
2012-11-02 9:23 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 2/8] cgroup: kill CSS_REMOVED Tejun Heo
[not found] ` <1351657365-25055-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 15:39 ` Michal Hocko
[not found] ` <20121031153926.GC22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 16:57 ` Tejun Heo
[not found] ` <20121031165739.GE2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 17:06 ` Glauber Costa
[not found] ` <50915A87.4070504-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:10 ` Tejun Heo
[not found] ` <CAOS58YP=CjTPFdETLRXnc3gXEzX2=EEe2dMdSh3Eov7zRfV4Qg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-31 17:19 ` Glauber Costa
[not found] ` <50915DB7.5020706-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:25 ` Tejun Heo
[not found] ` <20121031172522.GJ2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 17:38 ` Glauber Costa
[not found] ` <50916218.3090301-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:44 ` Tejun Heo
2012-10-31 17:39 ` Glauber Costa
2012-10-31 19:16 ` Michal Hocko
[not found] ` <20121031191602.GB1271-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 19:33 ` Tejun Heo
2012-11-02 9:30 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 3/8] cgroup: use cgroup_lock_live_group(parent) in cgroup_create() Tejun Heo
[not found] ` <1351657365-25055-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 15:55 ` Michal Hocko
[not found] ` <20121031155514.GD22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 17:04 ` Tejun Heo
[not found] ` <20121031170431.GF2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-11-01 9:16 ` Michal Hocko
[not found] ` <20121101091644.GA8533-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-01 14:52 ` Tejun Heo
[not found] ` <CAOS58YM+kRtspVUzmnSmOmrDoNS_kF6KA02zWGxqH5FUcRWo1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-01 15:05 ` Michal Hocko
[not found] ` <20121101150532.GA5065-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-01 15:15 ` Michal Hocko
[not found] ` <20121101151556.GB5065-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-01 15:43 ` Tejun Heo
2012-11-02 9:37 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` Tejun Heo [this message]
[not found] ` <1351657365-25055-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 13:42 ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Glauber Costa
2012-10-31 16:05 ` Michal Hocko
2012-11-02 9:43 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 5/8] cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() Tejun Heo
[not found] ` <1351657365-25055-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 16:27 ` Michal Hocko
[not found] ` <20121031162735.GF22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 17:16 ` Tejun Heo
2012-11-02 9:53 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 6/8] memcg: make mem_cgroup_reparent_charges non failing Tejun Heo
[not found] ` <1351657365-25055-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 9:54 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 7/8] hugetlb: do not fail in hugetlb_cgroup_pre_destroy Tejun Heo
[not found] ` <1351657365-25055-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 9:56 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 8/8] cgroup: make ->pre_destroy() return void Tejun Heo
[not found] ` <1351657365-25055-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 13:57 ` Vivek Goyal
2012-10-31 16:28 ` Michal Hocko
2012-11-02 9:57 ` Kamezawa Hiroyuki
2012-10-31 13:49 ` [PATCHSET] cgroup: simplify cgroup removal path Glauber Costa
[not found] ` <50912C6D.6020000-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:18 ` Tejun Heo
[not found] ` <20121031171849.GH2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 17:24 ` Glauber Costa
[not found] ` <50915EB6.3060704-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:26 ` Tejun Heo
[not found] ` <20121031172617.GK2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 17:33 ` Glauber Costa
2012-10-31 16:31 ` Michal Hocko
[not found] ` <20121031163134.GH22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 16:35 ` Tejun Heo
-- strict thread matches above, loose matches on Subject: below --
2012-10-31 18:16 [PATCHSET v2] " Tejun Heo
[not found] ` <1351707391-22287-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 18:16 ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Tejun Heo
2012-10-31 19:44 [PATCHSET RESEND v2] cgroup: simplify cgroup removal path Tejun Heo
2012-10-31 19:44 ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Tejun Heo
[not found] ` <1351712650-23709-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 21:23 ` Michal Hocko
[not found] ` <20121031212359.GC5286-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 21:27 ` Tejun Heo
[not found] ` <20121031212725.GA2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-11-01 8:58 ` Michal Hocko
2012-11-02 10:05 ` Kamezawa Hiroyuki
2012-11-05 5:37 ` Li Zefan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1351657365-25055-5-git-send-email-tj@kernel.org \
--to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).