All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	mm-commits-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org,
	Tejun Heo <htejun-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	cgroups mailinglist
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Subject: Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
Date: Thu, 19 Jul 2012 19:18:24 +0530	[thread overview]
Message-ID: <87ipdjc15j.fsf@skywalker.in.ibm.com> (raw)
In-Reply-To: <20120719123820.GG2864-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org>

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
>> 
>> > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>> > From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
>> > Date: Thu, 19 Jul 2012 13:23:23 +0200
>> > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>> >
>> > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>> > cgroup_mutex lock while calling pre_destroy callbacks because memory
>> > controller could deadlock because force_empty triggered reclaim.
>> > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>> > no reclaim going on from mem_cgroup_force_empty though so we can safely
>> > keep the cgroup_mutex locked. This has an advantage that no tasks might
>> > be added during pre_destroy callback and so the handlers don't have to
>> > consider races when new tasks add new charges. This simplifies the
>> > implementation.
>> > ---
>> >  kernel/cgroup.c |    2 --
>> >  1 file changed, 2 deletions(-)
>> >
>> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> > index 0f3527d..9dba05d 100644
>> > --- a/kernel/cgroup.c
>> > +++ b/kernel/cgroup.c
>> > @@ -4181,7 +4181,6 @@ again:
>> >  		mutex_unlock(&cgroup_mutex);
>> >  		return -EBUSY;
>> >  	}
>> > -	mutex_unlock(&cgroup_mutex);
>> >
>> >  	/*
>> >  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
>> > @@ -4204,7 +4203,6 @@ again:
>> >  		return ret;
>> >  	}
>> >
>> > -	mutex_lock(&cgroup_mutex);
>> >  	parent = cgrp->parent;
>> >  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>> >  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>> 
>> mem_cgroup_force_empty still calls 
>> 
>> lru_add_drain_all 
>>    ->schedule_on_each_cpu
>>         -> get_online_cpus
>>            ->mutex_lock(&cpu_hotplug.lock);
>> 
>> So wont we deadlock ?
>
> Yes you are right. I got it wrong. I thought that the reclaim is the
> main problem. It won't be that easy then and the origin mm patch
> (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
> or to be dropped.

We just need to remove the VM_BUG_ON() right ? The rest of the patch is
good right ? Otherwise how about the below

NOTE: Do we want to do s/mutex_[un]lock(&cgroup_mutex)/cgroup_[un]lock()/  ?

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7981850..01c67f4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4151,7 +4151,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4171,10 +4170,10 @@ again:
 	ret = cgroup_call_pre_destroy(cgrp);
 	if (ret) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
+		mutex_unlock(&cgroup_mutex);
 		return ret;
 	}
 
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e8ddc00..91c96df 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4993,9 +4993,18 @@ free_out:
 
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
+	int ret;
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	/*
+	 * we call lru_add_drain_all, which end up taking
+	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
+	 * the reverse order. So drop the cgroup lock
+	 */
+	ret = mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	return ret;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)

WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: akpm@linux-foundation.org, mm-commits@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, liwanp@linux.vnet.ibm.com,
	Tejun Heo <htejun@gmail.com>, Li Zefan <lizefan@huawei.com>,
	cgroups mailinglist <cgroups@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
Date: Thu, 19 Jul 2012 19:18:24 +0530	[thread overview]
Message-ID: <87ipdjc15j.fsf@skywalker.in.ibm.com> (raw)
In-Reply-To: <20120719123820.GG2864@tiehlicka.suse.cz>

Michal Hocko <mhocko@suse.cz> writes:

> On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko@suse.cz> writes:
>> 
>> > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>> > From: Michal Hocko <mhocko@suse.cz>
>> > Date: Thu, 19 Jul 2012 13:23:23 +0200
>> > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>> >
>> > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>> > cgroup_mutex lock while calling pre_destroy callbacks because memory
>> > controller could deadlock because force_empty triggered reclaim.
>> > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>> > no reclaim going on from mem_cgroup_force_empty though so we can safely
>> > keep the cgroup_mutex locked. This has an advantage that no tasks might
>> > be added during pre_destroy callback and so the handlers don't have to
>> > consider races when new tasks add new charges. This simplifies the
>> > implementation.
>> > ---
>> >  kernel/cgroup.c |    2 --
>> >  1 file changed, 2 deletions(-)
>> >
>> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> > index 0f3527d..9dba05d 100644
>> > --- a/kernel/cgroup.c
>> > +++ b/kernel/cgroup.c
>> > @@ -4181,7 +4181,6 @@ again:
>> >  		mutex_unlock(&cgroup_mutex);
>> >  		return -EBUSY;
>> >  	}
>> > -	mutex_unlock(&cgroup_mutex);
>> >
>> >  	/*
>> >  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
>> > @@ -4204,7 +4203,6 @@ again:
>> >  		return ret;
>> >  	}
>> >
>> > -	mutex_lock(&cgroup_mutex);
>> >  	parent = cgrp->parent;
>> >  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>> >  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>> 
>> mem_cgroup_force_empty still calls 
>> 
>> lru_add_drain_all 
>>    ->schedule_on_each_cpu
>>         -> get_online_cpus
>>            ->mutex_lock(&cpu_hotplug.lock);
>> 
>> So wont we deadlock ?
>
> Yes you are right. I got it wrong. I thought that the reclaim is the
> main problem. It won't be that easy then and the origin mm patch
> (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
> or to be dropped.

We just need to remove the VM_BUG_ON() right ? The rest of the patch is
good right ? Otherwise how about the below

NOTE: Do we want to do s/mutex_[un]lock(&cgroup_mutex)/cgroup_[un]lock()/  ?

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7981850..01c67f4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4151,7 +4151,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4171,10 +4170,10 @@ again:
 	ret = cgroup_call_pre_destroy(cgrp);
 	if (ret) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
+		mutex_unlock(&cgroup_mutex);
 		return ret;
 	}
 
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e8ddc00..91c96df 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4993,9 +4993,18 @@ free_out:
 
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
+	int ret;
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	/*
+	 * we call lru_add_drain_all, which end up taking
+	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
+	 * the reverse order. So drop the cgroup lock
+	 */
+	ret = mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	return ret;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-07-19 13:48 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-18 21:26 + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree akpm
     [not found] ` <20120718212637.133475C0050-gd0R4GGuC+lfGOtoag0VdhPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
2012-07-19 11:39   ` Michal Hocko
2012-07-19 11:39     ` Michal Hocko
2012-07-19 12:21     ` Aneesh Kumar K.V
     [not found]       ` <87r4s8gcwe.fsf-6yE53ggjAfyqSkle7U1LjlaTQe2KTcn/@public.gmane.org>
2012-07-19 12:38         ` Michal Hocko
2012-07-19 12:38           ` Michal Hocko
     [not found]           ` <20120719123820.GG2864-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org>
2012-07-19 13:48             ` Aneesh Kumar K.V [this message]
2012-07-19 13:48               ` Aneesh Kumar K.V
     [not found]               ` <87ipdjc15j.fsf-6yE53ggjAfyqSkle7U1LjlaTQe2KTcn/@public.gmane.org>
2012-07-19 14:09                 ` [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir Aneesh Kumar K.V
2012-07-19 14:09                   ` Aneesh Kumar K.V
     [not found]                   ` <1342706972-10912-1-git-send-email-aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-07-19 16:50                     ` Tejun Heo
2012-07-19 16:50                       ` Tejun Heo
2012-07-20 15:45                       ` Peter Zijlstra
2012-07-20 20:05                         ` Tejun Heo
2012-07-20 20:05                           ` Tejun Heo
2012-07-20 22:07                           ` Glauber Costa
     [not found]                           ` <20120720200542.GD21218-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-07-27  6:15                             ` Li Zefan
2012-07-27  6:15                               ` Li Zefan
     [not found]                               ` <501231F0.8050505-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2012-07-30 18:25                                 ` Tejun Heo
2012-07-30 18:25                                   ` Tejun Heo
2012-07-20  7:51                     ` Michal Hocko
2012-07-20  7:51                       ` Michal Hocko
2012-07-20 19:49                     ` Tejun Heo
2012-07-20 19:49                       ` Tejun Heo
2012-07-20  1:05                 ` + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree Kamezawa Hiroyuki
2012-07-20  1:05                   ` Kamezawa Hiroyuki
     [not found]                   ` <5008AEC2.9090707-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-07-20  1:20                     ` Kamezawa Hiroyuki
2012-07-20  1:20                       ` Kamezawa Hiroyuki
     [not found]                       ` <5008B25D.5000902-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-07-20  8:01                         ` Michal Hocko
2012-07-20  8:01                           ` Michal Hocko
2012-07-20  8:08                           ` Kamezawa Hiroyuki
2012-07-20  8:06               ` Michal Hocko
2012-07-20 19:18                 ` Aneesh Kumar K.V
2012-07-20 19:56                   ` Tejun Heo
     [not found]                     ` <20120720195643.GC21218-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-07-21  2:14                       ` Kamezawa Hiroyuki
2012-07-21  2:14                         ` Kamezawa Hiroyuki
2012-07-21  2:46                         ` Tejun Heo
2012-07-21  4:05                           ` Kamezawa Hiroyuki
     [not found]                             ` <500A2A79.5030705-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-07-22 17:34                               ` Tejun Heo
2012-07-22 17:34                                 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ipdjc15j.fsf@skywalker.in.ibm.com \
    --to=aneesh.kumar-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=htejun-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=mm-commits-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.