linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: linux-mm@kvack.org
Subject: Re: mm: Can we bail out p?d_alloc() loops upon SIGKILL?
Date: Thu, 28 Feb 2019 10:26:41 +0100	[thread overview]
Message-ID: <20190228092641.GW10588@dhcp22.suse.cz> (raw)
In-Reply-To: <ccd9e864-0e47-b0e3-8d0e-9431937b604c@i-love.sakura.ne.jp>

On Wed 27-02-19 19:39:19, Tetsuo Handa wrote:
> On 2019/02/27 18:21, Michal Hocko wrote:
> > On Wed 27-02-19 12:43:51, Tetsuo Handa wrote:
> >> I noticed that when a kdump kernel triggers the OOM killer because a too
> >> small value was given to crashkernel= parameter, the OOM reaper tends to
> >> fail to reclaim memory from OOM victims because they are in dup_mm() from
> >> copy_mm() from copy_process() with mmap_sem held for write.
> > 
> > I would presume that a page table allocation would fail for the oom
> > victim as soon as the oom memory reserves get depleted and then
> > copy_page_range would bail out and release the lock. That being
> > said, the oom_reaper might bail out before then but does sprinkling
> > fatal_signal_pending checks into copy_*_range really help reliably?
> > 
> 
> Yes, I think so. The OOM victim was just sleeping at might_sleep_if()
> rather than continue allocations until ALLOC_OOM allocation fails.
> Maybe the kdump kernel enables only one CPU somehow contributed that
> the OOM reaper gave up before ALLOC_OOM allocation fails. But if the OOM
> victim in a normal kernel had huge memory mapping where p?d_alloc() is
> called for so many times, and kernel frequently prevented the OOM victim
>  from continuing ALLOC_OOM allocations, it might not be rare cases (I
> don't have a huge machine for testing intensive p?d_alloc() loop) to
> hit this problem.

We cannot do anything about the preemption so that is moot. ALLOC_OOM
reserve is limited so the failure should happen sooner or later. But
I would be OK to check for fatal_signal_pending once per pmd or so if
that helps and it doesn't add a noticeable overhead.

> Technically, it would be possible to use a per task_struct flag
> which allows __alloc_pages_nodemask() to check early and bail out:
> 
>   down_write(&current->mm->mmap_sem);
>   current->no_oom_alloc = 1;
>   while (...) {
>       p?d_alloc();
>   }
>   current->no_oom_alloc = 0;
>   up_write(&current->mm->mmap_sem);

Looks like a hack to me. We already do have __GFP_NOMEMALLOC,
__GFP_MEMALLOC and PF_MEMALLOC and you want yet another way to control
access to reserves. This is a mess. If anything then PF_NOMEMALLOC would
be a better fit but the flag space is quite tight already. Besides that
is this really worth doing when the caller can bail out?
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2019-02-28  9:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-27  3:43 mm: Can we bail out p?d_alloc() loops upon SIGKILL? Tetsuo Handa
2019-02-27  9:21 ` Michal Hocko
2019-02-27 10:39   ` Tetsuo Handa
2019-02-28  9:26     ` Michal Hocko [this message]
2019-03-01 10:30       ` Tetsuo Handa
2019-03-01 11:49         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190228092641.GW10588@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).