From: Vlastimil Babka <vbabka@suse.cz>
To: David Rientjes <rientjes@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Andrea Arcangeli <aarcange@redhat.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [RFC 1/4] mm, compaction: introduce kcompactd
Date: Tue, 21 Jul 2015 11:03:58 +0200 [thread overview]
Message-ID: <55AE0AFE.8070200@suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.10.1507091439100.17177@chino.kir.corp.google.com>
On 07/09/2015 11:53 PM, David Rientjes wrote:
> On Thu, 2 Jul 2015, Vlastimil Babka wrote:
>
>> Memory compaction can be currently performed in several contexts:
>>
>> - kswapd balancing a zone after a high-order allocation failure
>> - direct compaction to satisfy a high-order allocation, including THP page
>> fault attemps
>> - khugepaged trying to collapse a hugepage
>> - manually from /proc
>>
>> The purpose of compaction is two-fold. The obvious purpose is to satisfy a
>> (pending or future) high-order allocation, and is easy to evaluate. The other
>> purpose is to keep overal memory fragmentation low and help the
>> anti-fragmentation mechanism. The success wrt the latter purpose is more
>> difficult to evaluate.
>>
>> The current situation wrt the purposes has a few drawbacks:
>>
>> - compaction is invoked only when a high-order page or hugepage is not
>> available (or manually). This might be too late for the purposes of keeping
>> memory fragmentation low.
>> - direct compaction increases latency of allocations. Again, it would be
>> better if compaction was performed asynchronously to keep fragmentation low,
>> before the allocation itself comes.
>> - (a special case of the previous) the cost of compaction during THP page
>> faults can easily offset the benefits of THP.
>>
>> To improve the situation, we need an equivalent of kswapd, but for compaction.
>> E.g. a background thread which responds to fragmentation and the need for
>> high-order allocations (including hugepages) somewhat proactively.
>>
>> One possibility is to extend the responsibilities of kswapd, which could
>> however complicate its design too much. It should be better to let kswapd
>> handle reclaim, as order-0 allocations are often more critical than high-order
>> ones.
>>
>> Another possibility is to extend khugepaged, but this kthread is a single
>> instance and tied to THP configs.
>>
>> This patch goes with the option of a new set of per-node kthreads called
>> kcompactd, and lays the foundations. The lifecycle mimics kswapd kthreads.
>>
>> The work loop of kcompactd currently mimics an pageblock-order direct
>> compaction attempt each 15 seconds. This might not be enough to keep
>> fragmentation low, and needs evaluation.
>>
>> When there's not enough free memory for compaction, kswapd is woken up for
>> reclaim only (not compaction/reclaim).
>>
>> Further patches will add the ability to wake up kcompactd on demand in special
>> situations such as when hugepages are not available, or when a fragmentation
>> event occured.
>>
>
> Thanks for looking at this again.
>
> The code is certainly clean and the responsibilities vs kswapd and
> khugepaged are clearly defined, but I'm not sure how receptive others
> would be of another per-node kthread.
We'll hopefully see...
> Khugepaged benefits from the periodic memory compaction being done
> immediately before it attempts to compact memory, and that may be lost
> with a de-coupled approach like this.
That could be helped with waking up khugepaged after kcompactd is
successful in making a hugepage available. Also in your rfc you propose
the compaction period to be 15 minutes, while khugepaged wakes up each
10 (or 30) seconds by default for the scanning and collapsing, so only
fraction of the work is attempted right after the compaction anyway?
> Initially, I suggested implementing this inside khugepaged for that
> purpose, and the full compaction could be done on the next
> scan_sleep_millisecs wakeup before allocating a hugepage and when
> kcompactd_sleep_millisecs would have expired. So the true period between
> memory compaction events could actually be
> kcompactd_sleep_millisecs - scan_sleep_millisecs.
>
> You bring up an interesting point, though, about non-hugepage uses of
> memory compaction and its effect on keeping fragmentation low. I'm not
> sure of any reports of that actually being an issue in the wild?
Hm reports of even not-so-high-order allocation failures occur from time
to time. Some might be from atomic context, but some are because
compaction just can't help due to the unmovable fragmentation. That's
mostly a guess, since such detailed information isn't there, but I think
Joonsoo did some experiments that confirmed this.
Also effects on the fragmentation are evaluated when making changes to
compaction, see e.g. http://marc.info/?l=linux-mm&m=143634369227134&w=2
In the past it has prevented changes that would improve latency of
direct compaction. They might be possible if there was a reliable source
of more thorough periodic compaction to counter the not-so-thorough
direct compaction.
> I know that the networking layer has done work recently to reduce page
> allocator latency for high-order allocations that can easily fallback to
> order-0 memory: see commit fb05e7a89f50 ("net: don't wait for order-3 page
> allocation").
Yep.
> The slub allocator does try to allocate its high-order memory with
> __GFP_WAIT before falling back to lower orders if possible. I would think
> that this would be the greatest sign of on-demand memory compaction being
> a problem, especially since CONFIG_SLUB is the default, but I haven't seen
> such reports.
Hm it's true I don't remember such report in the slub context.
> So I'm inclined to think that the current trouble spot for memory
> compaction is thp allocations. I may live to find differently :)
Yeah it's the most troublesome one, but I wouldn't discount the others.
> How would you feel about implementing this as part of the khugepaged loop
> before allocating a hugepage and scanning memory?
Yeah that's what the previous version did:
http://thread.gmane.org/gmane.linux.kernel.mm/132522
But I found it increasingly clumsy and something that should not depend
on CONFIG_THP only.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-07-21 9:04 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-02 8:46 [RFC v2 0/4] Outsourcing compaction for THP allocations to kcompactd Vlastimil Babka
2015-07-02 8:46 ` [RFC 1/4] mm, compaction: introduce kcompactd Vlastimil Babka
2015-07-09 21:53 ` David Rientjes
2015-07-21 9:03 ` Vlastimil Babka [this message]
2015-07-21 23:07 ` David Rientjes
2015-07-22 15:23 ` Vlastimil Babka
2015-07-22 22:36 ` David Rientjes
2015-07-23 9:18 ` Vlastimil Babka
2015-07-23 21:21 ` David Rientjes
2015-07-24 6:16 ` Joonsoo Kim
2015-07-24 6:45 ` Vlastimil Babka
2015-07-29 0:33 ` David Rientjes
2015-07-29 6:34 ` Vlastimil Babka
2015-07-29 21:54 ` David Rientjes
2015-07-29 23:57 ` Dave Chinner
2015-07-23 6:03 ` Joonsoo Kim
2015-07-23 20:58 ` David Rientjes
2015-07-24 5:33 ` Joonsoo Kim
2015-07-30 10:58 ` Mel Gorman
2015-07-31 21:17 ` David Rientjes
2015-07-02 8:46 ` [RFC 2/4] mm, thp: stop preallocating hugepages in khugepaged Vlastimil Babka
2015-07-02 8:46 ` [RFC 3/4] mm, thp: check for hugepage availability " Vlastimil Babka
2015-07-02 8:46 ` [RFC 4/4] mm, thp: check hugepage availability for fault allocations Vlastimil Babka
2015-07-24 14:22 ` [RFC v2 0/4] Outsourcing compaction for THP allocations to kcompactd Rik van Riel
2015-07-27 9:30 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55AE0AFE.8070200@suse.cz \
--to=vbabka@suse.cz \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).