From: Marc Duponcheel <marc@offline.be>
To: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
Marc Duponcheel <marc@offline.be>
Subject: Re: [Bug 49361] New: configuring TRANSPARENT_HUGEPAGE_ALWAYS can make system unresponsive and reboot
Date: Fri, 2 Nov 2012 19:00:32 +0100 [thread overview]
Message-ID: <20121102180032.GA26700@offline.be> (raw)
In-Reply-To: <20121101171406.GC8218@suse.de>
Hi Mel
Thanks for your interest.
it is 3.6.2 that I tested
I am not sure when TRANSPARENT_HUGEPAGE_ALWAYS was introduced but it
could be that the problem started first time I had it configured.
In any case, I think I saw problem first on 3.6.0 or 3.5.x that came
just before 3.6.0
Looking into logs: first time it happened was on Oct 7. But I am not
sure what exact kernel I was running then (it -could- have been
3.5.6).
I sincerely hope this helps ...
As mentioned, reproduction is not hard.
Also: I can grant ssh access for you to troubleshoot.
Have a nice day
On 2012 Nov 01, Mel Gorman wrote:
> On Mon, Oct 29, 2012 at 01:33:06PM -0700, David Rientjes wrote:
> > On Tue, 23 Oct 2012, David Rientjes wrote:
> >
> > > We'll need to collect some information before we can figure out what the
> > > problem is with 3.5.2.
> > >
>
> 3.6.2 or 3.5.2?
>
> The bug mentioned "recently" but does not say what the last known
> working kernel was. What is the most recent working kernel so the
> problem candidate can be narrowed down?
>
> > > First, let's take a look at khugepaged. By default, it's supposed to wake
> > > up rarely (10s at minimum) and only scan 4K pages before going back to
> > > sleep. Having a consistent and very high cpu usage suggests the settings
> > > aren't the default. Can you do
> > >
> > > cat /sys/kernel/mm/transparent_hugepage/khugepaged/{alloc,scan}_sleep_millisecs
> > >
> > > The defaults should be 60000 and 10000, respectively. Then can you do
> > >
> > > cat /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
> > >
> > > which should be 4096. If those are your settings, then it seems like
> > > khugepaged in 3.5.2 is going crazy and we'll need to look into that. Try
> > > collecting
> > >
> > > grep -e "thp|compact" /proc/vmstat
> > >
> > > and
> > >
> > > cat /proc/$(pidof khugepaged)/stack
> > >
> > > appended to a logfile at regular intervals after your start the build with
> > > transparent hugepages enabled always. After the machine becomes
> > > unresponsive and reboots, post that log.
> > >
> >
> > This looks like an overly aggressive memory compaction issue; consider
> > from your "49361.1" attachment:
> >
> > Sat Oct 27 02:39:05 CEST 2012
> > compact_blocks_moved 488381
> > compact_pages_moved 581856
> > compact_pagemigrate_failed 52533
> > compact_stall 59
> > compact_fail 36
> > compact_success 23
> > Sat Oct 27 02:39:15 CEST 2012
> > compact_blocks_moved 7797480
> > compact_pages_moved 589996
> > compact_pagemigrate_failed 53507
> > compact_stall 90
> > compact_fail 56
> > compact_success 24
> > Sat Oct 27 02:43:07 CEST 2012
> > compact_blocks_moved 276422153
> > compact_pages_moved 597836
> > compact_pagemigrate_failed 53886
> > compact_stall 109
> > compact_fail 76
> > compact_success 26
> >
> > In four minutes, transparent hugepage allocation has scanned 275933772 2MB
> > pageblocks and only been successful three times in defragmenting enough
> > memory for the allocation to succeed. It's scanning on average 5518675
> > pageblocks each time it is invoked.
> >
>
> We had the bug recently about excessive scanning and lock contention
> within compaction after lumpy reclaim was removed between 3.4 and 3.5.
> The impact was not obvious because compaction was used less frequently
> when lumpy reclaim was in place but once the crutch went away, it fell
> over. The "solution" as it stands right now is the following patches on
> top of 3.6. Can they be tested please?
>
> e64c5237cf6ff474cb2f3f832f48f2b441dd9979 mm: compaction: abort compaction loop if lock is contended or run too long
> 3cc668f4e30fbd97b3c0574d8cac7a83903c9bc7 mm: compaction: move fatal signal check out of compact_checklock_irqsave
> 661c4cb9b829110cb68c18ea05a56be39f75a4d2 mm: compaction: Update try_to_compact_pages()kerneldoc comment
> 2a1402aa044b55c2d30ab0ed9405693ef06fb07c mm: compaction: acquire the zone->lru_lock as late as possible
> f40d1e42bb988d2a26e8e111ea4c4c7bac819b7e mm: compaction: acquire the zone->lock as late as possible
> 753341a4b85ff337487b9959c71c529f522004f4 revert "mm: have order > 0 compaction start off where it left"
> bb13ffeb9f6bfeb301443994dfbf29f91117dfb3 mm: compaction: cache if a pageblock was scanned and no pages were isolated
> c89511ab2f8fe2b47585e60da8af7fd213ec877e mm: compaction: Restart compaction from near where it left off
> 62997027ca5b3d4618198ed8b1aba40b61b1137b mm: compaction: clear PG_migrate_skip based on compaction and reclaim activity
> 0db63d7e25f96e2c6da925c002badf6f144ddf30 mm: compaction: correct the nr_strict va isolated check for CMA
>
> I can provide a monolithic patch of these commits if that is preferred.
>
> > Adding Mel Gorman to the cc.
>
> Thanks David.
>
> --
> Mel Gorman
> SUSE Labs
--
Marc Duponcheel
Velodroomstraat 74 - 2600 Berchem - Belgium
+32 (0)478 68.10.91 - marc@offline.be
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2012-11-02 18:00 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-49361-27@https.bugzilla.kernel.org/>
2012-10-23 19:36 ` [Bug 49361] New: configuring TRANSPARENT_HUGEPAGE_ALWAYS can make system unresponsive and reboot Andrew Morton
2012-10-24 0:34 ` Ni zhan Chen
2012-10-24 0:40 ` Andrew Morton
2012-10-24 5:53 ` David Rientjes
2012-10-29 20:33 ` David Rientjes
2012-11-01 17:14 ` Mel Gorman
2012-11-02 18:00 ` Marc Duponcheel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121102180032.GA26700@offline.be \
--to=marc@offline.be \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).