From: Kamezawa Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
mhocko-AlSwsSmVLrQ@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Subject: Re: [PATCH 5/8] cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir()
Date: Fri, 02 Nov 2012 18:53:05 +0900 [thread overview]
Message-ID: <50939801.4000504@jp.fujitsu.com> (raw)
In-Reply-To: <1351657365-25055-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
(2012/10/31 13:22), Tejun Heo wrote:
> CGRP_WAIT_ON_RMDIR is another kludge which was added to make cgroup
> destruction rollback somewhat working. cgroup_rmdir() used to drain
> CSS references and CGRP_WAIT_ON_RMDIR and the associated waitqueue and
> helpers were used to allow the task performing rmdir to wait for the
> next relevant event.
>
> Unfortunately, the wait is visible to controllers too and the
> mechanism got exposed to memcg by 887032670d ("cgroup avoid permanent
> sleep at rmdir").
>
> Now that the draining and retries are gone, CGRP_WAIT_ON_RMDIR is
> unnecessary. Remove it and all the mechanisms supporting it. Note
> that memcontrol.c changes are essentially revert of 887032670d
> ("cgroup avoid permanent sleep at rmdir").
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Hmm, ok...
> ---
> include/linux/cgroup.h | 21 ---------------------
> kernel/cgroup.c | 51 --------------------------------------------------
> mm/memcontrol.c | 24 +-----------------------
> 3 files changed, 1 insertion(+), 95 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index a309804..47868a8 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -145,10 +145,6 @@ enum {
> /* Control Group requires release notifications to userspace */
> CGRP_NOTIFY_ON_RELEASE,
> /*
> - * A thread in rmdir() is wating for this cgroup.
> - */
> - CGRP_WAIT_ON_RMDIR,
> - /*
> * Clone cgroup values when creating a new child cgroup
> */
> CGRP_CLONE_CHILDREN,
> @@ -412,23 +408,6 @@ int cgroup_task_count(const struct cgroup *cgrp);
> int cgroup_is_descendant(const struct cgroup *cgrp, struct task_struct *task);
>
> /*
> - * When the subsys has to access css and may add permanent refcnt to css,
> - * it should take care of racy conditions with rmdir(). Following set of
> - * functions, is for stop/restart rmdir if necessary.
> - * Because these will call css_get/put, "css" should be alive css.
> - *
> - * cgroup_exclude_rmdir();
> - * ...do some jobs which may access arbitrary empty cgroup
> - * cgroup_release_and_wakeup_rmdir();
> - *
> - * When someone removes a cgroup while cgroup_exclude_rmdir() holds it,
> - * it sleeps and cgroup_release_and_wakeup_rmdir() will wake him up.
> - */
> -
> -void cgroup_exclude_rmdir(struct cgroup_subsys_state *css);
> -void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css);
> -
> -/*
> * Control Group taskset, used to pass around set of tasks to cgroup_subsys
> * methods.
> */
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 66204a6..c5f6fb2 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -966,33 +966,6 @@ static void cgroup_d_remove_dir(struct dentry *dentry)
> }
>
> /*
> - * A queue for waiters to do rmdir() cgroup. A tasks will sleep when
> - * cgroup->count == 0 && list_empty(&cgroup->children) && subsys has some
> - * reference to css->refcnt. In general, this refcnt is expected to goes down
> - * to zero, soon.
> - *
> - * CGRP_WAIT_ON_RMDIR flag is set under cgroup's inode->i_mutex;
> - */
> -static DECLARE_WAIT_QUEUE_HEAD(cgroup_rmdir_waitq);
> -
> -static void cgroup_wakeup_rmdir_waiter(struct cgroup *cgrp)
> -{
> - if (unlikely(test_and_clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags)))
> - wake_up_all(&cgroup_rmdir_waitq);
> -}
> -
> -void cgroup_exclude_rmdir(struct cgroup_subsys_state *css)
> -{
> - css_get(css);
> -}
> -
> -void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
> -{
> - cgroup_wakeup_rmdir_waiter(css->cgroup);
> - css_put(css);
> -}
> -
> -/*
> * Call with cgroup_mutex held. Drops reference counts on modules, including
> * any duplicate ones that parse_cgroupfs_options took. If this function
> * returns an error, no reference counts are touched.
> @@ -1963,12 +1936,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
> }
>
> synchronize_rcu();
> -
> - /*
> - * wake up rmdir() waiter. the rmdir should fail since the cgroup
> - * is no longer empty.
> - */
> - cgroup_wakeup_rmdir_waiter(cgrp);
> out:
> if (retval) {
> for_each_subsys(root, ss) {
> @@ -2138,7 +2105,6 @@ static int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
> * step 5: success! and cleanup
> */
> synchronize_rcu();
> - cgroup_wakeup_rmdir_waiter(cgrp);
> retval = 0;
> out_put_css_set_refs:
> if (retval) {
> @@ -4058,26 +4024,13 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
> struct cgroup_event *event, *tmp;
> struct cgroup_subsys *ss;
>
> - /*
> - * In general, subsystem has no css->refcnt after pre_destroy(). But
> - * in racy cases, subsystem may have to get css->refcnt after
> - * pre_destroy() and it makes rmdir return with -EBUSY. This sometimes
> - * make rmdir return -EBUSY too often. To avoid that, we use waitqueue
> - * for cgroup's rmdir. CGRP_WAIT_ON_RMDIR is for synchronizing rmdir
> - * and subsystem's reference count handling. Please see css_get/put
> - * and css_tryget() and cgroup_wakeup_rmdir_waiter() implementation.
> - */
> - set_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> -
> /* the vfs holds both inode->i_mutex already */
> mutex_lock(&cgroup_mutex);
> parent = cgrp->parent;
> if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
> - clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> mutex_unlock(&cgroup_mutex);
> return -EBUSY;
> }
> - prepare_to_wait(&cgroup_rmdir_waitq, &wait, TASK_INTERRUPTIBLE);
>
> /*
> * Block new css_tryget() by deactivating refcnt and mark @cgrp
> @@ -4114,9 +4067,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
> for_each_subsys(cgrp->root, ss)
> css_put(cgrp->subsys[ss->subsys_id]);
>
> - finish_wait(&cgroup_rmdir_waitq, &wait);
> - clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> -
> raw_spin_lock(&release_list_lock);
> if (!list_empty(&cgrp->release_list))
> list_del_init(&cgrp->release_list);
> @@ -4864,7 +4814,6 @@ void __css_put(struct cgroup_subsys_state *css)
> set_bit(CGRP_RELEASABLE, &cgrp->flags);
> check_for_release(cgrp);
> }
> - cgroup_wakeup_rmdir_waiter(cgrp);
> break;
> case 0:
> schedule_work(&css->dput_work);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6f8bd9d..1033b2b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2681,13 +2681,6 @@ static int mem_cgroup_move_account(struct page *page,
> /* caller should have done css_get */
> pc->mem_cgroup = to;
> mem_cgroup_charge_statistics(to, anon, nr_pages);
> - /*
> - * We charges against "to" which may not have any tasks. Then, "to"
> - * can be under rmdir(). But in current implementation, caller of
> - * this function is just force_empty() and move charge, so it's
> - * guaranteed that "to" is never removed. So, we don't check rmdir
> - * status here.
> - */
> move_unlock_mem_cgroup(from, &flags);
> ret = 0;
> unlock:
> @@ -2893,7 +2886,6 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *memcg,
> return;
> if (!memcg)
> return;
> - cgroup_exclude_rmdir(&memcg->css);
I'm sorry if I misunderstand...I think we need css_tryget() instead, here.
Following code will set pc->mem_cgroup to be this lost memcg. It's not good because
we assume there are no page_cgroup pointing this memcg after pre_desroy().
I think, at failure, we need to avoid set pc->mem_cgroup to this memcg.
Hmm, set pc->mem_cgroup as root_mem_cgroup should work.
>
> __mem_cgroup_commit_charge(memcg, page, 1, ctype, true);
> /*
> @@ -2907,12 +2899,6 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *memcg,
> swp_entry_t ent = {.val = page_private(page)};
> mem_cgroup_uncharge_swap(ent);
> }
> - /*
> - * At swapin, we may charge account against cgroup which has no tasks.
> - * So, rmdir()->pre_destroy() can be called while we do this charge.
> - * In that case, we need to call pre_destroy() again. check it here.
> - */
> - cgroup_release_and_wakeup_rmdir(&memcg->css);
css_put() ?
> }
>
> void mem_cgroup_commit_charge_swapin(struct page *page,
> @@ -3360,8 +3346,7 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg,
>
> if (!memcg)
> return;
> - /* blocks rmdir() */
> - cgroup_exclude_rmdir(&memcg->css);
css_tryget()
> +
> if (!migration_ok) {
> used = oldpage;
> unused = newpage;
> @@ -3395,13 +3380,6 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg,
> */
> if (anon)
> mem_cgroup_uncharge_page(used);
> - /*
> - * At migration, we may charge account against cgroup which has no
> - * tasks.
> - * So, rmdir()->pre_destroy() can be called while we do this charge.
> - * In that case, we need to call pre_destroy() again. check it here.
> - */
> - cgroup_release_and_wakeup_rmdir(&memcg->css);
css_put()..
Thanks,
-Kame
next prev parent reply other threads:[~2012-11-02 9:53 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-31 4:22 [PATCHSET] cgroup: simplify cgroup removal path Tejun Heo
[not found] ` <1351657365-25055-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 4:22 ` [PATCH 1/8] cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs Tejun Heo
[not found] ` <1351657365-25055-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 13:21 ` Glauber Costa
[not found] ` <509125D9.8070100-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 16:38 ` Tejun Heo
2012-10-31 14:37 ` Michal Hocko
[not found] ` <20121031143751.GA22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 16:41 ` Tejun Heo
[not found] ` <20121031164123.GD2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 16:48 ` Michal Hocko
[not found] ` <20121031164855.GI22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 17:22 ` Tejun Heo
2012-11-02 9:23 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 2/8] cgroup: kill CSS_REMOVED Tejun Heo
[not found] ` <1351657365-25055-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 15:39 ` Michal Hocko
[not found] ` <20121031153926.GC22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 16:57 ` Tejun Heo
[not found] ` <20121031165739.GE2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 17:06 ` Glauber Costa
[not found] ` <50915A87.4070504-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:10 ` Tejun Heo
[not found] ` <CAOS58YP=CjTPFdETLRXnc3gXEzX2=EEe2dMdSh3Eov7zRfV4Qg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-31 17:19 ` Glauber Costa
[not found] ` <50915DB7.5020706-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:25 ` Tejun Heo
[not found] ` <20121031172522.GJ2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 17:38 ` Glauber Costa
[not found] ` <50916218.3090301-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:44 ` Tejun Heo
2012-10-31 17:39 ` Glauber Costa
2012-10-31 19:16 ` Michal Hocko
[not found] ` <20121031191602.GB1271-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 19:33 ` Tejun Heo
2012-11-02 9:30 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 3/8] cgroup: use cgroup_lock_live_group(parent) in cgroup_create() Tejun Heo
[not found] ` <1351657365-25055-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 15:55 ` Michal Hocko
[not found] ` <20121031155514.GD22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 17:04 ` Tejun Heo
[not found] ` <20121031170431.GF2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-11-01 9:16 ` Michal Hocko
[not found] ` <20121101091644.GA8533-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-01 14:52 ` Tejun Heo
[not found] ` <CAOS58YM+kRtspVUzmnSmOmrDoNS_kF6KA02zWGxqH5FUcRWo1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-01 15:05 ` Michal Hocko
[not found] ` <20121101150532.GA5065-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-01 15:15 ` Michal Hocko
[not found] ` <20121101151556.GB5065-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-11-01 15:43 ` Tejun Heo
2012-11-02 9:37 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy() Tejun Heo
[not found] ` <1351657365-25055-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 13:42 ` Glauber Costa
2012-10-31 16:05 ` Michal Hocko
2012-11-02 9:43 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 5/8] cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() Tejun Heo
[not found] ` <1351657365-25055-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 16:27 ` Michal Hocko
[not found] ` <20121031162735.GF22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 17:16 ` Tejun Heo
2012-11-02 9:53 ` Kamezawa Hiroyuki [this message]
2012-10-31 4:22 ` [PATCH 6/8] memcg: make mem_cgroup_reparent_charges non failing Tejun Heo
[not found] ` <1351657365-25055-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 9:54 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 7/8] hugetlb: do not fail in hugetlb_cgroup_pre_destroy Tejun Heo
[not found] ` <1351657365-25055-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 9:56 ` Kamezawa Hiroyuki
2012-10-31 4:22 ` [PATCH 8/8] cgroup: make ->pre_destroy() return void Tejun Heo
[not found] ` <1351657365-25055-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 13:57 ` Vivek Goyal
2012-10-31 16:28 ` Michal Hocko
2012-11-02 9:57 ` Kamezawa Hiroyuki
2012-10-31 13:49 ` [PATCHSET] cgroup: simplify cgroup removal path Glauber Costa
[not found] ` <50912C6D.6020000-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:18 ` Tejun Heo
[not found] ` <20121031171849.GH2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 17:24 ` Glauber Costa
[not found] ` <50915EB6.3060704-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-10-31 17:26 ` Tejun Heo
[not found] ` <20121031172617.GK2945-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-31 17:33 ` Glauber Costa
2012-10-31 16:31 ` Michal Hocko
[not found] ` <20121031163134.GH22809-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-31 16:35 ` Tejun Heo
-- strict thread matches above, loose matches on Subject: below --
2012-10-31 18:16 [PATCHSET v2] " Tejun Heo
2012-10-31 18:16 ` [PATCH 5/8] cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() Tejun Heo
2012-10-31 19:44 [PATCHSET RESEND v2] cgroup: simplify cgroup removal path Tejun Heo
[not found] ` <1351712650-23709-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-31 19:44 ` [PATCH 5/8] cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() Tejun Heo
[not found] ` <1351712650-23709-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-11-02 10:20 ` Kamezawa Hiroyuki
2012-11-05 5:40 ` Li Zefan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50939801.4000504@jp.fujitsu.com \
--to=kamezawa.hiroyu-+cum20s59erqfuhtdcdx3a@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).