All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
To: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	KAMEZAWA Hiroyuki
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	Balbir Singh
	<bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH 4/6] cgroups: forbid pre_destroy callback to fail
Date: Fri, 19 Oct 2012 13:09:49 +0200	[thread overview]
Message-ID: <20121019110949.GC799@dhcp22.suse.cz> (raw)
In-Reply-To: <50811E5E.1090205-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

On Fri 19-10-12 17:33:18, Li Zefan wrote:
> On 2012/10/17 21:30, Michal Hocko wrote:
> > Now that mem_cgroup_pre_destroy callback doesn't fail finally we can
> > safely move on and forbit all the callbacks to fail. The last missing
> > piece is moving cgroup_call_pre_destroy after cgroup_clear_css_refs so
> > that css_tryget fails so no new charges for the memcg can happen.
> 
> > The callbacks are also called from within cgroup_lock to guarantee that
> > no new tasks show up. 
> 
> I'm afraid this won't work. See commit 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> ("cgroup: fix potential deadlock in pre_destroy")

Very good point. Thanks for poiting this out. So we should call
pre_destroy at the very end? What about the following?
Or should be rather drop the lock after check_for_release(parent) or
sooner but after CGRP_REMOVED is set?
---
From 70ea8718aba1c1784b94bfb26aa2307195c07c0b Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Date: Wed, 17 Oct 2012 13:42:06 +0200
Subject: [PATCH] cgroups: forbid pre_destroy callback to fail

Now that mem_cgroup_pre_destroy callback doesn't fail finally we can
safely move on and forbit all the callbacks to fail. The last missing
piece is moving cgroup_call_pre_destroy after cgroup_clear_css_refs so
that css_tryget fails so no new charges for the memcg can happen.
We cannot, however, move cgroup_call_pre_destroy right after because we
cannot call mem_cgroup_pre_destroy with the cgroup_lock held (see
3fa59dfb cgroup: fix potential deadlock in pre_destroy) so we have to
move it after the lock is released.

Changes since v1
- Li Zefan pointed out that mem_cgroup_pre_destroy cannot be called with
  cgroup_lock held

Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
 kernel/cgroup.c |   30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b7d9606..4c6adbd 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -855,7 +855,7 @@ static struct inode *cgroup_new_inode(umode_t mode, struct super_block *sb)
  * Call subsys's pre_destroy handler.
  * This is called before css refcnt check.
  */
-static int cgroup_call_pre_destroy(struct cgroup *cgrp)
+static void cgroup_call_pre_destroy(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
 	int ret = 0;
@@ -864,15 +864,8 @@ static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 		if (!ss->pre_destroy)
 			continue;
 
-		ret = ss->pre_destroy(cgrp);
-		if (ret) {
-			/* ->pre_destroy() failure is being deprecated */
-			WARN_ON_ONCE(!ss->__DEPRECATED_clear_css_refs);
-			break;
-		}
+		BUG_ON(ss->pre_destroy(cgrp));
 	}
-
-	return ret;
 }
 
 static void cgroup_diput(struct dentry *dentry, struct inode *inode)
@@ -4161,7 +4154,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4174,17 +4166,6 @@ again:
 	 */
 	set_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
 
-	/*
-	 * Call pre_destroy handlers of subsys. Notify subsystems
-	 * that rmdir() request comes.
-	 */
-	ret = cgroup_call_pre_destroy(cgrp);
-	if (ret) {
-		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
-		return ret;
-	}
-
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
@@ -4206,6 +4187,7 @@ again:
 			return -EINTR;
 		goto again;
 	}
+
 	/* NO css_tryget() can success after here. */
 	finish_wait(&cgroup_rmdir_waitq, &wait);
 	clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
@@ -4244,6 +4226,12 @@ again:
 	spin_unlock(&cgrp->event_list_lock);
 
 	mutex_unlock(&cgroup_mutex);
+
+	/*
+	 * Call pre_destroy handlers of subsys. Notify subsystems
+	 * that rmdir() request comes.
+	 */
+	cgroup_call_pre_destroy(cgrp);
 	return 0;
 }
 
-- 
1.7.10.4


-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Li Zefan <lizefan@huawei.com>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: [PATCH 4/6] cgroups: forbid pre_destroy callback to fail
Date: Fri, 19 Oct 2012 13:09:49 +0200	[thread overview]
Message-ID: <20121019110949.GC799@dhcp22.suse.cz> (raw)
In-Reply-To: <50811E5E.1090205@huawei.com>

On Fri 19-10-12 17:33:18, Li Zefan wrote:
> On 2012/10/17 21:30, Michal Hocko wrote:
> > Now that mem_cgroup_pre_destroy callback doesn't fail finally we can
> > safely move on and forbit all the callbacks to fail. The last missing
> > piece is moving cgroup_call_pre_destroy after cgroup_clear_css_refs so
> > that css_tryget fails so no new charges for the memcg can happen.
> 
> > The callbacks are also called from within cgroup_lock to guarantee that
> > no new tasks show up. 
> 
> I'm afraid this won't work. See commit 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> ("cgroup: fix potential deadlock in pre_destroy")

Very good point. Thanks for poiting this out. So we should call
pre_destroy at the very end? What about the following?
Or should be rather drop the lock after check_for_release(parent) or
sooner but after CGRP_REMOVED is set?
---

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Li Zefan <lizefan@huawei.com>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: [PATCH 4/6] cgroups: forbid pre_destroy callback to fail
Date: Fri, 19 Oct 2012 13:09:49 +0200	[thread overview]
Message-ID: <20121019110949.GC799@dhcp22.suse.cz> (raw)
In-Reply-To: <50811E5E.1090205@huawei.com>

On Fri 19-10-12 17:33:18, Li Zefan wrote:
> On 2012/10/17 21:30, Michal Hocko wrote:
> > Now that mem_cgroup_pre_destroy callback doesn't fail finally we can
> > safely move on and forbit all the callbacks to fail. The last missing
> > piece is moving cgroup_call_pre_destroy after cgroup_clear_css_refs so
> > that css_tryget fails so no new charges for the memcg can happen.
> 
> > The callbacks are also called from within cgroup_lock to guarantee that
> > no new tasks show up. 
> 
> I'm afraid this won't work. See commit 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> ("cgroup: fix potential deadlock in pre_destroy")

Very good point. Thanks for poiting this out. So we should call
pre_destroy at the very end? What about the following?
Or should be rather drop the lock after check_for_release(parent) or
sooner but after CGRP_REMOVED is set?
---
>From 70ea8718aba1c1784b94bfb26aa2307195c07c0b Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Wed, 17 Oct 2012 13:42:06 +0200
Subject: [PATCH] cgroups: forbid pre_destroy callback to fail

Now that mem_cgroup_pre_destroy callback doesn't fail finally we can
safely move on and forbit all the callbacks to fail. The last missing
piece is moving cgroup_call_pre_destroy after cgroup_clear_css_refs so
that css_tryget fails so no new charges for the memcg can happen.
We cannot, however, move cgroup_call_pre_destroy right after because we
cannot call mem_cgroup_pre_destroy with the cgroup_lock held (see
3fa59dfb cgroup: fix potential deadlock in pre_destroy) so we have to
move it after the lock is released.

Changes since v1
- Li Zefan pointed out that mem_cgroup_pre_destroy cannot be called with
  cgroup_lock held

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/cgroup.c |   30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b7d9606..4c6adbd 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -855,7 +855,7 @@ static struct inode *cgroup_new_inode(umode_t mode, struct super_block *sb)
  * Call subsys's pre_destroy handler.
  * This is called before css refcnt check.
  */
-static int cgroup_call_pre_destroy(struct cgroup *cgrp)
+static void cgroup_call_pre_destroy(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
 	int ret = 0;
@@ -864,15 +864,8 @@ static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 		if (!ss->pre_destroy)
 			continue;
 
-		ret = ss->pre_destroy(cgrp);
-		if (ret) {
-			/* ->pre_destroy() failure is being deprecated */
-			WARN_ON_ONCE(!ss->__DEPRECATED_clear_css_refs);
-			break;
-		}
+		BUG_ON(ss->pre_destroy(cgrp));
 	}
-
-	return ret;
 }
 
 static void cgroup_diput(struct dentry *dentry, struct inode *inode)
@@ -4161,7 +4154,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4174,17 +4166,6 @@ again:
 	 */
 	set_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
 
-	/*
-	 * Call pre_destroy handlers of subsys. Notify subsystems
-	 * that rmdir() request comes.
-	 */
-	ret = cgroup_call_pre_destroy(cgrp);
-	if (ret) {
-		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
-		return ret;
-	}
-
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
@@ -4206,6 +4187,7 @@ again:
 			return -EINTR;
 		goto again;
 	}
+
 	/* NO css_tryget() can success after here. */
 	finish_wait(&cgroup_rmdir_waitq, &wait);
 	clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
@@ -4244,6 +4226,12 @@ again:
 	spin_unlock(&cgrp->event_list_lock);
 
 	mutex_unlock(&cgroup_mutex);
+
+	/*
+	 * Call pre_destroy handlers of subsys. Notify subsystems
+	 * that rmdir() request comes.
+	 */
+	cgroup_call_pre_destroy(cgrp);
 	return 0;
 }
 
-- 
1.7.10.4


-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2012-10-19 11:09 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 13:30 [RFC] memcg/cgroup: do not fail fail on pre_destroy callbacks Michal Hocko
2012-10-17 13:30 ` Michal Hocko
2012-10-17 13:30 ` [PATCH 1/6] memcg: split mem_cgroup_force_empty into reclaiming and reparenting parts Michal Hocko
2012-10-17 13:30   ` Michal Hocko
2012-10-18 21:56   ` Tejun Heo
2012-10-18 21:56     ` Tejun Heo
2012-10-17 13:30 ` [PATCH 2/6] memcg: root_cgroup cannot reach mem_cgroup_move_parent Michal Hocko
2012-10-17 13:30   ` Michal Hocko
     [not found]   ` <1350480648-10905-3-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>
2012-10-18 21:58     ` Tejun Heo
2012-10-18 21:58       ` Tejun Heo
2012-10-18 21:58       ` Tejun Heo
2012-10-17 13:30 ` [PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling Michal Hocko
2012-10-17 13:30   ` Michal Hocko
2012-10-18 22:16   ` Tejun Heo
2012-10-18 22:16     ` Tejun Heo
     [not found]     ` <20121018221654.GP13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-19 13:24       ` Michal Hocko
2012-10-19 13:24         ` Michal Hocko
2012-10-19 13:24         ` Michal Hocko
     [not found]         ` <20121019132438.GD799-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-19 19:49           ` Tejun Heo
2012-10-19 19:49             ` Tejun Heo
2012-10-19 19:49             ` Tejun Heo
2012-10-17 13:30 ` [PATCH 4/6] cgroups: forbid pre_destroy callback to fail Michal Hocko
2012-10-17 13:30   ` Michal Hocko
2012-10-18 22:41   ` Tejun Heo
2012-10-18 22:41     ` Tejun Heo
     [not found]     ` <20121018224148.GR13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-18 22:46       ` Tejun Heo
2012-10-18 22:46         ` Tejun Heo
2012-10-18 22:46         ` Tejun Heo
     [not found]         ` <20121018224606.GS13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-19 13:34           ` Michal Hocko
2012-10-19 13:34             ` Michal Hocko
2012-10-19 13:34             ` Michal Hocko
2012-10-19 13:32       ` Michal Hocko
2012-10-19 13:32         ` Michal Hocko
2012-10-19 13:32         ` Michal Hocko
     [not found]         ` <20121019133244.GE799-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-19 20:24           ` Tejun Heo
2012-10-19 20:24             ` Tejun Heo
2012-10-19 20:24             ` Tejun Heo
     [not found]             ` <20121019202405.GR13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-22 10:30               ` Michal Hocko
2012-10-22 10:30                 ` Michal Hocko
2012-10-22 10:30                 ` Michal Hocko
2012-10-24 19:25                 ` Tejun Heo
2012-10-24 19:25                   ` Tejun Heo
     [not found]                   ` <20121024192535.GG12182-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-25 14:37                     ` Michal Hocko
2012-10-25 14:37                       ` Michal Hocko
2012-10-25 14:37                       ` Michal Hocko
     [not found]                       ` <20121025143756.GI11105-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-25 17:42                         ` Tejun Heo
2012-10-25 17:42                           ` Tejun Heo
2012-10-25 17:42                           ` Tejun Heo
2012-10-25 18:48                           ` Michal Hocko
2012-10-25 18:48                             ` Michal Hocko
2012-10-19  9:33   ` Li Zefan
2012-10-19  9:33     ` Li Zefan
     [not found]     ` <50811E5E.1090205-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2012-10-19 11:09       ` Michal Hocko [this message]
2012-10-19 11:09         ` Michal Hocko
2012-10-19 11:09         ` Michal Hocko
2012-10-19 20:17       ` Tejun Heo
2012-10-19 20:17         ` Tejun Heo
2012-10-19 20:17         ` Tejun Heo
2012-10-17 13:30 ` [PATCH 5/6] memcg: make mem_cgroup_reparent_charges non failing Michal Hocko
2012-10-17 13:30   ` Michal Hocko
2012-10-18  8:30   ` Li Zefan
2012-10-18  8:30     ` Li Zefan
2012-10-18  8:42     ` Michal Hocko
2012-10-18  8:42       ` Michal Hocko
2012-10-18 22:48   ` Tejun Heo
2012-10-18 22:48     ` Tejun Heo
2012-10-19 13:49   ` Michal Hocko
2012-10-19 13:49     ` Michal Hocko
2012-10-19 13:49     ` Michal Hocko
2012-10-17 13:30 ` [PATCH 6/6] hugetlb: do not fail in hugetlb_cgroup_pre_destroy Michal Hocko
2012-10-17 13:30   ` Michal Hocko
     [not found]   ` <1350480648-10905-7-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>
2012-10-18 22:48     ` Tejun Heo
2012-10-18 22:48       ` Tejun Heo
2012-10-18 22:48       ` Tejun Heo
2012-10-17 15:30 ` [RFC] memcg/cgroup: do not fail fail on pre_destroy callbacks Glauber Costa
2012-10-17 15:30   ` Glauber Costa
     [not found] ` <1350480648-10905-1-git-send-email-mhocko-AlSwsSmVLrQ@public.gmane.org>
2012-10-18  0:29   ` Kamezawa Hiroyuki
2012-10-18  0:29     ` Kamezawa Hiroyuki
2012-10-18  0:29     ` Kamezawa Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121019110949.GC799@dhcp22.suse.cz \
    --to=mhocko-alswssmvlrq@public.gmane.org \
    --cc=bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.