* Re: [PATCH 2/2] memcg: change memcg_oom_mutex to spinlock [not found] ` <b24894c23d0bb06f849822cb30726b532ea3a4c5.1310732789.git.mhocko@suse.cz> @ 2011-07-20 5:55 ` KAMEZAWA Hiroyuki 2011-07-20 7:01 ` Michal Hocko 2011-07-20 6:34 ` Balbir Singh 1 sibling, 1 reply; 11+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-20 5:55 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Thu, 14 Jul 2011 17:29:51 +0200 Michal Hocko <mhocko@suse.cz> wrote: > memcg_oom_mutex is used to protect memcg OOM path and eventfd interface > for oom_control. None of the critical sections which it protects sleep > (eventfd_signal works from atomic context and the rest are simple linked > list resp. oom_lock atomic operations). > Mutex is also too heavy weight for those code paths because it triggers > a lot of scheduling. It also makes makes convoying effects more visible > when we have a big number of oom killing because we take the lock > mutliple times during mem_cgroup_handle_oom so we have multiple places > where many processes can sleep. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] memcg: change memcg_oom_mutex to spinlock 2011-07-20 5:55 ` [PATCH 2/2] memcg: change memcg_oom_mutex to spinlock KAMEZAWA Hiroyuki @ 2011-07-20 7:01 ` Michal Hocko 0 siblings, 0 replies; 11+ messages in thread From: Michal Hocko @ 2011-07-20 7:01 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Wed 20-07-11 14:55:53, KAMEZAWA Hiroyuki wrote: > On Thu, 14 Jul 2011 17:29:51 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > memcg_oom_mutex is used to protect memcg OOM path and eventfd interface > > for oom_control. None of the critical sections which it protects sleep > > (eventfd_signal works from atomic context and the rest are simple linked > > list resp. oom_lock atomic operations). > > Mutex is also too heavy weight for those code paths because it triggers > > a lot of scheduling. It also makes makes convoying effects more visible > > when we have a big number of oom killing because we take the lock > > mutliple times during mem_cgroup_handle_oom so we have multiple places > > where many processes can sleep. > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Thanks! -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] memcg: change memcg_oom_mutex to spinlock [not found] ` <b24894c23d0bb06f849822cb30726b532ea3a4c5.1310732789.git.mhocko@suse.cz> 2011-07-20 5:55 ` [PATCH 2/2] memcg: change memcg_oom_mutex to spinlock KAMEZAWA Hiroyuki @ 2011-07-20 6:34 ` Balbir Singh 2011-07-20 7:00 ` Michal Hocko 1 sibling, 1 reply; 11+ messages in thread From: Balbir Singh @ 2011-07-20 6:34 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Thu, Jul 14, 2011 at 8:59 PM, Michal Hocko <mhocko@suse.cz> wrote: > memcg_oom_mutex is used to protect memcg OOM path and eventfd interface > for oom_control. None of the critical sections which it protects sleep > (eventfd_signal works from atomic context and the rest are simple linked > list resp. oom_lock atomic operations). > Mutex is also too heavy weight for those code paths because it triggers > a lot of scheduling. It also makes makes convoying effects more visible > when we have a big number of oom killing because we take the lock > mutliple times during mem_cgroup_handle_oom so we have multiple places > where many processes can sleep. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> Quick question: How long do we expect this lock to be taken? What happens under oom? Any tests? Numbers? Balbir Singh ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] memcg: change memcg_oom_mutex to spinlock 2011-07-20 6:34 ` Balbir Singh @ 2011-07-20 7:00 ` Michal Hocko 0 siblings, 0 replies; 11+ messages in thread From: Michal Hocko @ 2011-07-20 7:00 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Wed 20-07-11 12:04:17, Balbir Singh wrote: > On Thu, Jul 14, 2011 at 8:59 PM, Michal Hocko <mhocko@suse.cz> wrote: > > memcg_oom_mutex is used to protect memcg OOM path and eventfd interface > > for oom_control. None of the critical sections which it protects sleep > > (eventfd_signal works from atomic context and the rest are simple linked > > list resp. oom_lock atomic operations). > > Mutex is also too heavy weight for those code paths because it triggers > > a lot of scheduling. It also makes makes convoying effects more visible > > when we have a big number of oom killing because we take the lock > > mutliple times during mem_cgroup_handle_oom so we have multiple places > > where many processes can sleep. > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > Quick question: How long do we expect this lock to be taken? The lock is taken in * mem_cgroup_handle_oom at 2 places - to protect mem_cgroup_oom_lock and mem_cgroup_oom_notify - to protect mem_cgroup_oom_unlock and memcg_wakeup_oom mem_cgroup_oom_{un}lock as well as mem_cgroup_oom_notify scale with the number of groups in the hierarchy. mem_cgroup_oom_notify scales with the number of all blocked tasks on the memcg_oom_waitq (which is not mem_cgroup specific) and memcg_oom_wake_function can go up the hierarchy for all of them in the worst case. * mem_cgroup_oom_register_event uses it to protect notifier registration (one list_add operation) + notification in case the group is already under oom - we can consider both operations to be constant time * mem_cgroup_oom_unregister_event protects unregistration so it scales with the number of notifiers. I guess this is potentially unlimitted but I wouldn't be afraid of that as we call just list_del to every one. > What happens under oom? Could you be more specific? Does the above exaplains? > Any tests? Numbers? I was testing with the test mentioned in the other patch and I couldn't measure any significant difference. That is why I noted that I do not have any hard numbers to base my argumentation on. It is just that the mutex doesn't _feel_ right in the code paths we are using it now. > Balbir Singh Thanks -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <44ec61829ed8a83b55dc90a7aebffdd82fe0e102.1310732789.git.mhocko@suse.cz>]
* Re: [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner [not found] ` <44ec61829ed8a83b55dc90a7aebffdd82fe0e102.1310732789.git.mhocko@suse.cz> @ 2011-07-21 20:58 ` Andrew Morton 2011-07-22 0:15 ` KAMEZAWA Hiroyuki 2011-08-09 14:03 ` Johannes Weiner 1 sibling, 1 reply; 11+ messages in thread From: Andrew Morton @ 2011-07-21 20:58 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, linux-kernel On Wed, 13 Jul 2011 13:05:49 +0200 Michal Hocko <mhocko@suse.cz> wrote: > @@ -1893,6 +1942,8 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *mem, gfp_t mask) does: : memcg_wakeup_oom(mem); : mutex_unlock(&memcg_oom_mutex); : : mem_cgroup_unmark_under_oom(mem); : : if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current)) : return false; : /* Give chance to dying process */ : schedule_timeout(1); : return true; : } Calling schedule_timeout() in state TASK_RUNNING is equivalent to calling schedule() and then pointlessly wasting some CPU cycles. Someone might want to take a look at that, and wonder why this bug wasn't detected in testing ;) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner 2011-07-21 20:58 ` [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner Andrew Morton @ 2011-07-22 0:15 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 11+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-22 0:15 UTC (permalink / raw) To: Andrew Morton; +Cc: Michal Hocko, linux-mm, Balbir Singh, linux-kernel On Thu, 21 Jul 2011 13:58:17 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 13 Jul 2011 13:05:49 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > @@ -1893,6 +1942,8 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *mem, gfp_t mask) > > does: > > : memcg_wakeup_oom(mem); > : mutex_unlock(&memcg_oom_mutex); > : > : mem_cgroup_unmark_under_oom(mem); > : > : if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current)) > : return false; > : /* Give chance to dying process */ > : schedule_timeout(1); > : return true; > : } > > Calling schedule_timeout() in state TASK_RUNNING is equivalent to > calling schedule() and then pointlessly wasting some CPU cycles. > Ouch (--; > Someone might want to take a look at that, and wonder why this bug > wasn't detected in testing ;) > I wonder just removing this is okay....because we didn't noticed this in our recent oom tests. I'll do some. Thanks, -Kame ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner [not found] ` <44ec61829ed8a83b55dc90a7aebffdd82fe0e102.1310732789.git.mhocko@suse.cz> 2011-07-21 20:58 ` [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner Andrew Morton @ 2011-08-09 14:03 ` Johannes Weiner 2011-08-09 15:22 ` Michal Hocko 1 sibling, 1 reply; 11+ messages in thread From: Johannes Weiner @ 2011-08-09 14:03 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Wed, Jul 13, 2011 at 01:05:49PM +0200, Michal Hocko wrote: > @@ -1803,37 +1806,83 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, > /* > * Check OOM-Killer is already running under our hierarchy. > * If someone is running, return false. > + * Has to be called with memcg_oom_mutex > */ > static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) > { > - int x, lock_count = 0; > - struct mem_cgroup *iter; > + int lock_count = -1; > + struct mem_cgroup *iter, *failed = NULL; > + bool cond = true; > > - for_each_mem_cgroup_tree(iter, mem) { > - x = atomic_inc_return(&iter->oom_lock); > - lock_count = max(x, lock_count); > + for_each_mem_cgroup_tree_cond(iter, mem, cond) { > + bool locked = iter->oom_lock; > + > + iter->oom_lock = true; > + if (lock_count == -1) > + lock_count = iter->oom_lock; > + else if (lock_count != locked) { > + /* > + * this subtree of our hierarchy is already locked > + * so we cannot give a lock. > + */ > + lock_count = 0; > + failed = iter; > + cond = false; > + } I noticed system-wide hangs during a parallel/hierarchical memcg test and found that a single task with a central i_mutex held was sleeping on the memcg oom waitqueue, stalling everyone else contending for that same inode. The problem is the above code, which never succeeds in hierarchies with more than one member. The first task going OOM tries to oom lock the hierarchy, fails, goes to sleep on the OOM waitqueue with the mutex held, without anybody actually OOM killing anything to make progress. Here is a patch that rectified things for me. --- >From c4b52cbe01ed67d6487a96850400cdf5a9de91aa Mon Sep 17 00:00:00 2001 From: Johannes Weiner <jweiner@redhat.com> Date: Tue, 9 Aug 2011 15:31:30 +0200 Subject: [patch] memcg: fix hierarchical oom locking Commit "79dfdac memcg: make oom_lock 0 and 1 based rather than counter" tried to oom lock the hierarchy and roll back upon encountering an already locked memcg. The code is pretty confused when it comes to detecting a locked memcg, though, so it would fail and rollback after locking one memcg and encountering an unlocked second one. The result is that oom-locking hierarchies fails unconditionally and that every oom killer invocation simply goes to sleep on the oom waitqueue forever. The tasks practically hang forever without anyone intervening, possibly holding locks that trip up unrelated tasks, too. Signed-off-by: Johannes Weiner <jweiner@redhat.com> --- mm/memcontrol.c | 14 ++++---------- 1 files changed, 4 insertions(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 930de94..649c568 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1841,25 +1841,19 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, */ static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) { - int lock_count = -1; struct mem_cgroup *iter, *failed = NULL; bool cond = true; for_each_mem_cgroup_tree_cond(iter, mem, cond) { - bool locked = iter->oom_lock; - - iter->oom_lock = true; - if (lock_count == -1) - lock_count = iter->oom_lock; - else if (lock_count != locked) { + if (iter->oom_lock) { /* * this subtree of our hierarchy is already locked * so we cannot give a lock. */ - lock_count = 0; failed = iter; cond = false; - } + } else + iter->oom_lock = true; } if (!failed) @@ -1878,7 +1872,7 @@ static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) iter->oom_lock = false; } done: - return lock_count; + return failed == NULL; } /* -- 1.7.6 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner 2011-08-09 14:03 ` Johannes Weiner @ 2011-08-09 15:22 ` Michal Hocko 2011-08-09 15:37 ` Johannes Weiner 0 siblings, 1 reply; 11+ messages in thread From: Michal Hocko @ 2011-08-09 15:22 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Tue 09-08-11 16:03:12, Johannes Weiner wrote: > On Wed, Jul 13, 2011 at 01:05:49PM +0200, Michal Hocko wrote: > > @@ -1803,37 +1806,83 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, > > /* > > * Check OOM-Killer is already running under our hierarchy. > > * If someone is running, return false. > > + * Has to be called with memcg_oom_mutex > > */ > > static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) > > { > > - int x, lock_count = 0; > > - struct mem_cgroup *iter; > > + int lock_count = -1; > > + struct mem_cgroup *iter, *failed = NULL; > > + bool cond = true; > > > > - for_each_mem_cgroup_tree(iter, mem) { > > - x = atomic_inc_return(&iter->oom_lock); > > - lock_count = max(x, lock_count); > > + for_each_mem_cgroup_tree_cond(iter, mem, cond) { > > + bool locked = iter->oom_lock; > > + > > + iter->oom_lock = true; > > + if (lock_count == -1) > > + lock_count = iter->oom_lock; > > + else if (lock_count != locked) { > > + /* > > + * this subtree of our hierarchy is already locked > > + * so we cannot give a lock. > > + */ > > + lock_count = 0; > > + failed = iter; > > + cond = false; > > + } > > I noticed system-wide hangs during a parallel/hierarchical memcg test > and found that a single task with a central i_mutex held was sleeping > on the memcg oom waitqueue, stalling everyone else contending for that > same inode. Nasty. Thanks for reporting and fixing this. The condition is screwed totally :/ > > The problem is the above code, which never succeeds in hierarchies > with more than one member. The first task going OOM tries to oom lock > the hierarchy, fails, goes to sleep on the OOM waitqueue with the > mutex held, without anybody actually OOM killing anything to make > progress. > > Here is a patch that rectified things for me. > > --- > From c4b52cbe01ed67d6487a96850400cdf5a9de91aa Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <jweiner@redhat.com> > Date: Tue, 9 Aug 2011 15:31:30 +0200 > Subject: [patch] memcg: fix hierarchical oom locking > > Commit "79dfdac memcg: make oom_lock 0 and 1 based rather than > counter" tried to oom lock the hierarchy and roll back upon > encountering an already locked memcg. > > The code is pretty confused when it comes to detecting a locked memcg, > though, so it would fail and rollback after locking one memcg and > encountering an unlocked second one. > > The result is that oom-locking hierarchies fails unconditionally and > that every oom killer invocation simply goes to sleep on the oom > waitqueue forever. The tasks practically hang forever without anyone > intervening, possibly holding locks that trip up unrelated tasks, too. > > Signed-off-by: Johannes Weiner <jweiner@redhat.com> Looks good. Thanks! Just a minor nit about done label bellow. Acked-by: Michal Hocko <mhocko@suse.cz> > --- > mm/memcontrol.c | 14 ++++---------- > 1 files changed, 4 insertions(+), 10 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 930de94..649c568 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1841,25 +1841,19 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, > */ > static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) > { > - int lock_count = -1; Yes, the whole lock_count thingy is just stupid. We care just about all or nothing and state of the first is really not important. > struct mem_cgroup *iter, *failed = NULL; > bool cond = true; > > for_each_mem_cgroup_tree_cond(iter, mem, cond) { > - bool locked = iter->oom_lock; > - > - iter->oom_lock = true; > - if (lock_count == -1) > - lock_count = iter->oom_lock; > - else if (lock_count != locked) { > + if (iter->oom_lock) { > /* > * this subtree of our hierarchy is already locked > * so we cannot give a lock. > */ > - lock_count = 0; > failed = iter; > cond = false; > - } > + } else > + iter->oom_lock = true; > } > > if (!failed) We can return here and get rid of done label. > @@ -1878,7 +1872,7 @@ static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) > iter->oom_lock = false; > } > done: > - return lock_count; > + return failed == NULL; > } > > /* > -- > 1.7.6 -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner 2011-08-09 15:22 ` Michal Hocko @ 2011-08-09 15:37 ` Johannes Weiner 2011-08-09 15:43 ` Michal Hocko 2011-08-10 0:22 ` KAMEZAWA Hiroyuki 0 siblings, 2 replies; 11+ messages in thread From: Johannes Weiner @ 2011-08-09 15:37 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Tue, Aug 09, 2011 at 05:22:18PM +0200, Michal Hocko wrote: > On Tue 09-08-11 16:03:12, Johannes Weiner wrote: > > struct mem_cgroup *iter, *failed = NULL; > > bool cond = true; > > > > for_each_mem_cgroup_tree_cond(iter, mem, cond) { > > - bool locked = iter->oom_lock; > > - > > - iter->oom_lock = true; > > - if (lock_count == -1) > > - lock_count = iter->oom_lock; > > - else if (lock_count != locked) { > > + if (iter->oom_lock) { > > /* > > * this subtree of our hierarchy is already locked > > * so we cannot give a lock. > > */ > > - lock_count = 0; > > failed = iter; > > cond = false; > > - } > > + } else > > + iter->oom_lock = true; > > } > > > > if (!failed) > > We can return here and get rid of done label. Ah, right you are. Here is an update. --- >From 86b36904033e6c6a1af4716e9deef13ebd31e64c Mon Sep 17 00:00:00 2001 From: Johannes Weiner <jweiner@redhat.com> Date: Tue, 9 Aug 2011 15:31:30 +0200 Subject: [patch] memcg: fix hierarchical oom locking Commit "79dfdac memcg: make oom_lock 0 and 1 based rather than counter" tried to oom lock the hierarchy and roll back upon encountering an already locked memcg. The code is confused when it comes to detecting a locked memcg, though, so it would fail and rollback after locking one memcg and encountering an unlocked second one. The result is that oom-locking hierarchies fails unconditionally and that every oom killer invocation simply goes to sleep on the oom waitqueue forever. The tasks practically hang forever without anyone intervening, possibly holding locks that trip up unrelated tasks, too. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> --- mm/memcontrol.c | 17 +++++------------ 1 files changed, 5 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c6faa32..f39c8fb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1843,29 +1843,23 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, */ static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) { - int lock_count = -1; struct mem_cgroup *iter, *failed = NULL; bool cond = true; for_each_mem_cgroup_tree_cond(iter, mem, cond) { - bool locked = iter->oom_lock; - - iter->oom_lock = true; - if (lock_count == -1) - lock_count = iter->oom_lock; - else if (lock_count != locked) { + if (iter->oom_lock) { /* * this subtree of our hierarchy is already locked * so we cannot give a lock. */ - lock_count = 0; failed = iter; cond = false; - } + } else + iter->oom_lock = true; } if (!failed) - goto done; + return true; /* * OK, we failed to lock the whole subtree so we have to clean up @@ -1879,8 +1873,7 @@ static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) } iter->oom_lock = false; } -done: - return lock_count; + return false; } /* -- 1.7.6 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner 2011-08-09 15:37 ` Johannes Weiner @ 2011-08-09 15:43 ` Michal Hocko 2011-08-10 0:22 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 11+ messages in thread From: Michal Hocko @ 2011-08-09 15:43 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Tue 09-08-11 17:37:32, Johannes Weiner wrote: > On Tue, Aug 09, 2011 at 05:22:18PM +0200, Michal Hocko wrote: > > On Tue 09-08-11 16:03:12, Johannes Weiner wrote: > > > struct mem_cgroup *iter, *failed = NULL; > > > bool cond = true; > > > > > > for_each_mem_cgroup_tree_cond(iter, mem, cond) { > > > - bool locked = iter->oom_lock; > > > - > > > - iter->oom_lock = true; > > > - if (lock_count == -1) > > > - lock_count = iter->oom_lock; > > > - else if (lock_count != locked) { > > > + if (iter->oom_lock) { > > > /* > > > * this subtree of our hierarchy is already locked > > > * so we cannot give a lock. > > > */ > > > - lock_count = 0; > > > failed = iter; > > > cond = false; > > > - } > > > + } else > > > + iter->oom_lock = true; > > > } > > > > > > if (!failed) > > > > We can return here and get rid of done label. > > Ah, right you are. Here is an update. Thanks! > > --- > From 86b36904033e6c6a1af4716e9deef13ebd31e64c Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <jweiner@redhat.com> > Date: Tue, 9 Aug 2011 15:31:30 +0200 > Subject: [patch] memcg: fix hierarchical oom locking > > Commit "79dfdac memcg: make oom_lock 0 and 1 based rather than > counter" tried to oom lock the hierarchy and roll back upon > encountering an already locked memcg. > > The code is confused when it comes to detecting a locked memcg, > though, so it would fail and rollback after locking one memcg and > encountering an unlocked second one. It is actually worse than that. The way how it is broken also allows to lock a hierarchy which already contains locked subtree... > > The result is that oom-locking hierarchies fails unconditionally and > that every oom killer invocation simply goes to sleep on the oom > waitqueue forever. The tasks practically hang forever without anyone > intervening, possibly holding locks that trip up unrelated tasks, too. > > Signed-off-by: Johannes Weiner <jweiner@redhat.com> > Acked-by: Michal Hocko <mhocko@suse.cz> > --- > mm/memcontrol.c | 17 +++++------------ > 1 files changed, 5 insertions(+), 12 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c6faa32..f39c8fb 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1843,29 +1843,23 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem, > */ > static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) > { > - int lock_count = -1; > struct mem_cgroup *iter, *failed = NULL; > bool cond = true; > > for_each_mem_cgroup_tree_cond(iter, mem, cond) { > - bool locked = iter->oom_lock; > - > - iter->oom_lock = true; > - if (lock_count == -1) > - lock_count = iter->oom_lock; > - else if (lock_count != locked) { > + if (iter->oom_lock) { > /* > * this subtree of our hierarchy is already locked > * so we cannot give a lock. > */ > - lock_count = 0; > failed = iter; > cond = false; > - } > + } else > + iter->oom_lock = true; > } > > if (!failed) > - goto done; > + return true; > > /* > * OK, we failed to lock the whole subtree so we have to clean up > @@ -1879,8 +1873,7 @@ static bool mem_cgroup_oom_lock(struct mem_cgroup *mem) > } > iter->oom_lock = false; > } > -done: > - return lock_count; > + return false; > } > > /* > -- > 1.7.6 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner 2011-08-09 15:37 ` Johannes Weiner 2011-08-09 15:43 ` Michal Hocko @ 2011-08-10 0:22 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 11+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-10 0:22 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 17:37:32 +0200 Johannes Weiner <jweiner@redhat.com> wrote: > On Tue, Aug 09, 2011 at 05:22:18PM +0200, Michal Hocko wrote: > > On Tue 09-08-11 16:03:12, Johannes Weiner wrote: > > > struct mem_cgroup *iter, *failed = NULL; > > > bool cond = true; > > > > > > for_each_mem_cgroup_tree_cond(iter, mem, cond) { > > > - bool locked = iter->oom_lock; > > > - > > > - iter->oom_lock = true; > > > - if (lock_count == -1) > > > - lock_count = iter->oom_lock; > > > - else if (lock_count != locked) { > > > + if (iter->oom_lock) { > > > /* > > > * this subtree of our hierarchy is already locked > > > * so we cannot give a lock. > > > */ > > > - lock_count = 0; > > > failed = iter; > > > cond = false; > > > - } > > > + } else > > > + iter->oom_lock = true; > > > } > > > > > > if (!failed) > > > > We can return here and get rid of done label. > > Ah, right you are. Here is an update. > > --- > From 86b36904033e6c6a1af4716e9deef13ebd31e64c Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <jweiner@redhat.com> > Date: Tue, 9 Aug 2011 15:31:30 +0200 > Subject: [patch] memcg: fix hierarchical oom locking > > Commit "79dfdac memcg: make oom_lock 0 and 1 based rather than > counter" tried to oom lock the hierarchy and roll back upon > encountering an already locked memcg. > > The code is confused when it comes to detecting a locked memcg, > though, so it would fail and rollback after locking one memcg and > encountering an unlocked second one. > > The result is that oom-locking hierarchies fails unconditionally and > that every oom killer invocation simply goes to sleep on the oom > waitqueue forever. The tasks practically hang forever without anyone > intervening, possibly holding locks that trip up unrelated tasks, too. > > Signed-off-by: Johannes Weiner <jweiner@redhat.com> > Acked-by: Michal Hocko <mhocko@suse.cz> Thanks, Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-08-10 0:29 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1310732789.git.mhocko@suse.cz>
[not found] ` <b24894c23d0bb06f849822cb30726b532ea3a4c5.1310732789.git.mhocko@suse.cz>
2011-07-20 5:55 ` [PATCH 2/2] memcg: change memcg_oom_mutex to spinlock KAMEZAWA Hiroyuki
2011-07-20 7:01 ` Michal Hocko
2011-07-20 6:34 ` Balbir Singh
2011-07-20 7:00 ` Michal Hocko
[not found] ` <44ec61829ed8a83b55dc90a7aebffdd82fe0e102.1310732789.git.mhocko@suse.cz>
2011-07-21 20:58 ` [PATCH 1/2 v2] memcg: make oom_lock 0 and 1 based rather than coutner Andrew Morton
2011-07-22 0:15 ` KAMEZAWA Hiroyuki
2011-08-09 14:03 ` Johannes Weiner
2011-08-09 15:22 ` Michal Hocko
2011-08-09 15:37 ` Johannes Weiner
2011-08-09 15:43 ` Michal Hocko
2011-08-10 0:22 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox