From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Manish Jaggi <mjaggi@caviumnetworks.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Possible race condition in oom-killer
Date: Fri, 28 Jul 2017 15:07:23 +0200 [thread overview]
Message-ID: <20170728130723.GP2274@dhcp22.suse.cz> (raw)
In-Reply-To: <46e1e3ee-af9a-4e67-8b4b-5cf21478ad21@I-love.SAKURA.ne.jp>
On Fri 28-07-17 21:59:50, Tetsuo Handa wrote:
> (Oops. Forgot to add CC.)
>
> On 2017/07/28 21:32, Michal Hocko wrote:
> > [CC linux-mm]
> >
> > On Fri 28-07-17 17:22:25, Manish Jaggi wrote:
> >> was: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
> >>
> >> Hi Michal,
> >> On 7/27/2017 2:54 PM, Michal Hocko wrote:
> >>> On Thu 27-07-17 13:59:09, Manish Jaggi wrote:
> >>> [...]
> >>>> With 4.11.6 I was getting random kernel panics (Out of memory - No process left to kill),
> >>>> when running LTP oom01 /oom02 ltp tests on our arm64 hardware with ~256G memory and high core count.
> >>>> The issue experienced was as follows
> >>>> that either test (oom01/oom02) selected a pid as victim and waited for the pid to be killed.
> >>>> that pid was marked as killed but somewhere there is a race and the process didnt get killed.
> >>>> and the oom01/oom02 test started killing further processes, till it panics.
> >>>> IIUC this issue is quite similar to your patch description. But applying your patch I still see the issue.
> >>>> If it is not related to this patch, can you please suggest by looking at the log, what could be preventing
> >>>> the killing of victim.
> >>>>
> >>>> Log (https://pastebin.com/hg5iXRj2)
> >>>>
> >>>> As a subtest of oom02 starts, it prints out the victim - In this case 4578
> >>>>
> >>>> oom02 0 TINFO : start OOM testing for mlocked pages.
> >>>> oom02 0 TINFO : expected victim is 4578.
> >>>>
> >>>> When oom02 thread invokes oom-killer, it did select 4578 for killing...
> >>> I will definitely have a look. Can you report it in a separate email
> >>> thread please? Are you able to reproduce with the current Linus or
> >>> linux-next trees?
> >> Yes this issue is visible with linux-next.
> >
> > Could you provide the full kernel log from this run please? I do not
> > expect there to be much difference but just to be sure that the code I
> > am looking at matches logs.
>
> 4578 is consuming memory as mlocked pages. But the OOM reaper cannot reclaim
> mlocked pages (i.e. can_madv_dontneed_vma() returns false due to VM_LOCKED), can it?
You are absolutely right. I am pretty sure I've checked mlocked counter
as the first thing but that must be from one of the earlier oom reports.
My fault I haven't checked it in the critical one
[ 365.267347] oom_reaper: reaped process 4578 (oom02), now anon-rss:131559616kB, file-rss:0kB, shmem-rss:0kB
[ 365.282658] oom_reaper: reaped process 4583 (oom02), now anon-rss:131561664kB, file-rss:0kB, shmem-rss:0kB
and the above screemed about the fact I was just completely blind.
mlock pages handling is on my todo list for quite some time already but
I didn't get around it to implement that. mlock code is very tricky.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-07-28 13:07 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <e6c83a26-1d59-4afd-55cf-04e58bdde188@caviumnetworks.com>
2017-07-28 12:32 ` Possible race condition in oom-killer Michal Hocko
2017-07-28 12:59 ` Tetsuo Handa
2017-07-28 13:07 ` Michal Hocko [this message]
2017-07-28 13:15 ` Tetsuo Handa
2017-07-28 13:29 ` Michal Hocko
2017-07-28 13:55 ` Tetsuo Handa
2017-07-28 14:07 ` Michal Hocko
2017-07-29 4:31 ` Tetsuo Handa
2017-08-01 12:14 ` Michal Hocko
2017-08-01 14:16 ` Tetsuo Handa
2017-08-01 14:47 ` Michal Hocko
2017-08-01 10:46 ` Tetsuo Handa
2017-08-01 11:30 ` Michal Hocko
2017-07-28 13:15 ` Manish Jaggi
2017-07-28 13:50 ` Manish Jaggi
2017-07-28 14:12 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170728130723.GP2274@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjaggi@caviumnetworks.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).