From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932671AbbIUNrR (ORCPT ); Mon, 21 Sep 2015 09:47:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39494 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932159AbbIUNrP (ORCPT ); Mon, 21 Sep 2015 09:47:15 -0400 Date: Mon, 21 Sep 2015 15:44:14 +0200 From: Oleg Nesterov To: Linus Torvalds Cc: Kyle Walker , Christoph Lameter , Michal Hocko , Andrew Morton , David Rientjes , Johannes Weiner , Vladimir Davydov , linux-mm , Linux Kernel Mailing List , Stanislav Kozina , Tetsuo Handa Subject: Re: can't oom-kill zap the victim's memory? Message-ID: <20150921134414.GA15974@redhat.com> References: <1442512783-14719-1-git-send-email-kwalker@redhat.com> <20150919150316.GB31952@redhat.com> <20150920125642.GA2104@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/20, Linus Torvalds wrote: > > On Sun, Sep 20, 2015 at 5:56 AM, Oleg Nesterov wrote: > > > > In this case the workqueue thread will block. > > What workqueue thread? I must have missed something. I can't understand your and Michal's concerns. > pagefault_out_of_memory -> > out_of_memory -> > oom_kill_process > > as far as I can tell, this can be called by any task. Now, that > pagefault case should only happen when the page fault comes from user > space, but we also have > > __alloc_pages_slowpath -> > __alloc_pages_may_oom -> > out_of_memory -> > oom_kill_process > > which can be called from just about any context (but atomic > allocations will never get here, so it can schedule etc). So yes, in general oom_kill_process() can't call oom_unmap_func() directly. That is why the patch uses queue_work(oom_unmap_func). The workqueue thread takes mmap_sem and frees the memory allocated by user space. If this can lead to deadlock somehow, then we can hit the same deadlock when an oom-killed thread calls exit_mm(). > So what's your point? This can help if the killed process refuse to die and (of course) it doesn't hold the mmap_sem for writing. Say, it waits for some mutex held by the task which tries to alloc the memory and triggers oom. > Explain again just how do you guarantee that you > can take the mmap_sem. This is not guaranteed, down_read(mmap_sem) can block forever. But this means that the (killed) victim never drops mmap_sem / never exits, so we lose anyway. We have no memory, oom-killer is blocked, etc. Oleg.