From: Johannes Stezenbach <js@sig21.net>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Michal Hocko <mhocko@kernel.org>
Subject: Re: 4.6.2 frequent crashes under memory + IO pressure
Date: Thu, 23 Jun 2016 11:18:30 +0200 [thread overview]
Message-ID: <20160623091830.GA32535@sig21.net> (raw)
In-Reply-To: <c9c87635-6e00-5ce7-b05a-966011c8fe3f@I-love.SAKURA.ne.jp>
On Tue, Jun 21, 2016 at 08:47:51PM +0900, Tetsuo Handa wrote:
> Johannes Stezenbach wrote:
> >
> > a man's got to have a hobby, thus I'm running Android AOSP
> > builds on my home PC which has 4GB of RAM, 4GB swap.
> > Apparently it is not really adequate for the job but used to
> > work with a 4.4.10 kernel. Now I upgraded to 4.6.2
> > and it crashes usually within 30mins during compilation.
>
> Such reproducer is welcomed.
> You might be hitting OOM livelock using innocent workload.
>
> > The crash is a hard hang, mouse doesn't move, no reaction
> > to keyboard, nothing in logs (systemd journal) after reboot.
>
> Yes, it seems to me that your system is OOM livelocked.
I got from my crash log that X is hanging in
i915_gem_object_get_pages_gtt, and network is dead
due to order 0 allocation errors causing a series of
"ath9k_htc: RX memory allocation error", which is
what makes the issue so unpleasant.
The particular command which triggers it seems to be
Jill from the Android Java toolchain
(http://tools.android.com/tech-docs/jackandjill),
which runs as "java -Xmx3500m -jar $(JILL_JAR)", i.e.
potentially eating all my available RAM when linking
the Android framework.
Meanwhile I found some RAM and linux-4.6.2 runs stable
with 8GB for this workload. The build time (for the
partial AOSP rebuild that fairly reliably triggered the hangup)
dropped from ~20min to ~17min (so it wasn't trashing too
badly), swap usage dropped from ~50% (of 4GB) to <5%.
> It is sad that we haven't merged kmallocwd which will report
> which memory allocations are stalling
> ( http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp ).
Would you like me to try it? It wouldn't prevent the hang, though,
just print better debug ouptut to serial console, right?
Or would it OOM kill some process?
> > Then I tried 4.5.7, it seems to be stable so far.
> >
> > I'm using dm-crypt + lvm + ext4 (swap also in lvm).
> >
> > Now I hooked up a laptop to the serial port and captured
> > some logs of the crash which seems to be repeating
> >
> > [ 2240.842567] swapper/3: page allocation failure: order:0, mode:0x2200020(GFP_NOWAIT|__GFP_HIGH|__GFP_NOTRACK)
> > or
> > [ 2241.167986] SLUB: Unable to allocate memory on node -1, gfp=0x2080020(GFP_ATOMIC)
> >
> > over and over. Based on the backtraces in the log I decided
> > to hot-unplug USB devices, and twice the kernel came
> > back to live, but on the 3rd crash it was dead for good.
>
> The values
>
> DMA free:12kB min:32kB
> DMA32 free:2268kB min:6724kB
> Normal free:84kB min:928kB
>
> suggest that memory reserves are spent for pointless purpose. Maybe your system is
> falling into situation which was mitigated by commit 78ebc2f7146156f4 ("mm,writeback:
> don't use memory reserves for wb_start_writeback"). Thus, applying that commit to
> your 4.6.2 kernel might help avoiding flood of these allocation failure messages.
I could try. Could you let me know if booting with mem=4G
is equivalent, or do I need to use memmap= or physically remove
the RAM (which is not so easy since the CPU fan is in the way).
> > Before I pressed the reset button I used SysRq-W. At the bottom
> > is a "BUG: workqueue lockup", it could be the result of
> > the log spew on serial console taking so long but it looks
> > like some IO is never completing.
>
> But even after you apply that commit, I guess you will still see silent hang up
> because the page allocator would think there is still reclaimable memory. So, is
> it possible to also try current linux.git kernels? I'd like to know whether
> "OOM detection rework" (which went to 4.7) helps giving up reclaiming and
> invoking the OOM killer with your workload.
>
> Maybe __GFP_FS allocations start invoking the OOM killer. But maybe __GFP_FS
> allocations still remain stuck waiting for !__GFP_FS allocations whereas !__GFP_FS
> allocations gives up without invoking the OOM killer (i.e. effectively no "give up").
I could also try. Same question about mem= though.
What is your opinion about older kernels (4.4, 4.5) working?
I think I've seen some OOM messages with the older kernels,
Jill was killed and I restarted the build to complete it.
A full bisect would take more than a day, I don't think
I have the time for it.
Since I use dm-crypt + lvm, should we add more Cc or do
you think it is an mm issue?
> > Below I'm pasting some log snippets, let me know if you like
> > it so much you want more of it ;-/ The total log is about 1.7MB.
>
> Yes, I'd like to browse it. Could you send it to me?
Did you get any additional insights from it?
Thanks,
Johannes
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-06-23 9:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-16 21:26 4.6.2 frequent crashes under memory + IO pressure Johannes Stezenbach
2016-06-21 11:47 ` Tetsuo Handa
2016-06-23 9:18 ` Johannes Stezenbach [this message]
2016-06-23 11:26 ` Tetsuo Handa
2016-06-25 15:50 ` Johannes Stezenbach
2016-06-25 17:04 ` Tetsuo Handa
2016-06-25 17:29 ` Johannes Stezenbach
2016-06-26 9:00 ` Tetsuo Handa
[not found] ` <20160626150958.GA3780@sig21.net>
[not found] ` <201606270135.CGD13081.LHFtFVQOSOMOJF@I-love.SAKURA.ne.jp>
2016-06-26 19:40 ` Johannes Stezenbach
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160623091830.GA32535@sig21.net \
--to=js@sig21.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).