From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932671AbbIUNrR (ORCPT <rfc822;w@1wt.eu>);
	Mon, 21 Sep 2015 09:47:17 -0400
Received: from mx1.redhat.com ([209.132.183.28]:39494 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932159AbbIUNrP (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 21 Sep 2015 09:47:15 -0400
Date: Mon, 21 Sep 2015 15:44:14 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kyle Walker <kwalker@redhat.com>, Christoph Lameter <cl@linux.com>,
        Michal Hocko <mhocko@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        David Rientjes <rientjes@google.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Vladimir Davydov <vdavydov@parallels.com>,
        linux-mm <linux-mm@kvack.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Stanislav Kozina <skozina@redhat.com>,
        Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Subject: Re: can't oom-kill zap the victim's memory?
Message-ID: <20150921134414.GA15974@redhat.com>
References: <1442512783-14719-1-git-send-email-kwalker@redhat.com> <20150919150316.GB31952@redhat.com> <CA+55aFwkvbMrGseOsZNaxgP3wzDoVjkGasBKFxpn07SaokvpXA@mail.gmail.com> <20150920125642.GA2104@redhat.com> <CA+55aFyajHq2W9HhJWbLASFkTx_kLSHtHuY6mDHKxmoW-LnVEw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFyajHq2W9HhJWbLASFkTx_kLSHtHuY6mDHKxmoW-LnVEw@mail.gmail.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 09/20, Linus Torvalds wrote:
>
> On Sun, Sep 20, 2015 at 5:56 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > In this case the workqueue thread will block.
>
> What workqueue thread?

I must have missed something. I can't understand your and Michal's
concerns.

>    pagefault_out_of_memory ->
>       out_of_memory ->
>          oom_kill_process
>
> as far as I can tell, this can be called by any task. Now, that
> pagefault case should only happen when the page fault comes from user
> space, but we also have
>
>    __alloc_pages_slowpath ->
>       __alloc_pages_may_oom ->
>          out_of_memory ->
>             oom_kill_process
>
> which can be called from just about any context (but atomic
> allocations will never get here, so it can schedule etc).

So yes, in general oom_kill_process() can't call oom_unmap_func() directly.
That is why the patch uses queue_work(oom_unmap_func). The workqueue thread
takes mmap_sem and frees the memory allocated by user space.

If this can lead to deadlock somehow, then we can hit the same deadlock
when an oom-killed thread calls exit_mm().

> So what's your point?

This can help if the killed process refuse to die and (of course) it
doesn't hold the mmap_sem for writing. Say, it waits for some mutex
held by the task which tries to alloc the memory and triggers oom.

> Explain again just how do you guarantee that you
> can take the mmap_sem.

This is not guaranteed, down_read(mmap_sem) can block forever. But this
means that the (killed) victim never drops mmap_sem / never exits, so
we lose anyway. We have no memory, oom-killer is blocked, etc.

Oleg.