From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx187.postini.com [74.125.245.187]) by kanga.kvack.org (Postfix) with SMTP id 80B006B0071 for ; Tue, 10 Jul 2012 17:20:01 -0400 (EDT) Date: Tue, 10 Jul 2012 14:19:59 -0700 From: Andrew Morton Subject: Re: [patch 3/5] mm, memcg: introduce own oom handler to iterate only over its own threads Message-Id: <20120710141959.b6a3ecbe.akpm@linux-foundation.org> In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: KAMEZAWA Hiroyuki , Michal Hocko , Johannes Weiner , KOSAKI Motohiro , Minchan Kim , Oleg Nesterov , linux-mm@kvack.org, cgroups@vger.kernel.org On Fri, 29 Jun 2012 14:06:56 -0700 (PDT) David Rientjes wrote: > The global oom killer is serialized by the zonelist being used in the > page allocation. Brain hurts. Presumably this is referring to some lock within the zonelist. Clarify, please? > Concurrent oom kills are thus a rare event and only > occur in systems using mempolicies and with a large number of nodes. > > Memory controller oom kills, however, can frequently be concurrent since > there is no serialization once the oom killer is called for oom > conditions in several different memcgs in parallel. > > This creates a massive contention on tasklist_lock since the oom killer > requires the readside for the tasklist iteration. If several memcgs are > calling the oom killer, this lock can be held for a substantial amount of > time, especially if threads continue to enter it as other threads are > exiting. > > Since the exit path grabs the writeside of the lock with irqs disabled in > a few different places, this can cause a soft lockup on cpus as a result > of tasklist_lock starvation. > > The kernel lacks unfair writelocks, and successful calls to the oom > killer usually result in at least one thread entering the exit path, so > an alternative solution is needed. > > This patch introduces a seperate oom handler for memcgs so that they do > not require tasklist_lock for as much time. Instead, it iterates only > over the threads attached to the oom memcg and grabs a reference to the > selected thread before calling oom_kill_process() to ensure it doesn't > prematurely exit. > > This still requires tasklist_lock for the tasklist dump, iterating > children of the selected process, and killing all other threads on the > system sharing the same memory as the selected victim. So while this > isn't a complete solution to tasklist_lock starvation, it significantly > reduces the amount of time that it is held. > > > ... > > @@ -1469,6 +1469,65 @@ u64 mem_cgroup_get_limit(struct mem_cgroup *memcg) > return min(limit, memsw); > } > > +void __mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > + int order) Perhaps have a comment over this function explaining why it exists? > +{ > + struct mem_cgroup *iter; > + unsigned long chosen_points = 0; > + unsigned long totalpages; > + unsigned int points = 0; > + struct task_struct *chosen = NULL; > + > + totalpages = mem_cgroup_get_limit(memcg) >> PAGE_SHIFT ? : 1; > + for_each_mem_cgroup_tree(iter, memcg) { > + struct cgroup *cgroup = iter->css.cgroup; > + struct cgroup_iter it; > + struct task_struct *task; > + > + cgroup_iter_start(cgroup, &it); > + while ((task = cgroup_iter_next(cgroup, &it))) { > + switch (oom_scan_process_thread(task, totalpages, NULL, > + false)) { > + case OOM_SCAN_SELECT: > > ... > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org