From: Daniel Micay <danielmicay@gmail.com>
To: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Aliaksey Kandratsenka <alkondratenko@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Shaohua Li <shli@fb.com>,
linux-mm@kvack.org, linux-api@vger.kernel.org,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Mel Gorman <mel@csn.ul.ie>, Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@suse.cz>,
Andy Lutomirski <luto@amacapital.net>,
"google-perftools@googlegroups.com"
<google-perftools@googlegroups.com>
Subject: Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
Date: Wed, 25 Mar 2015 23:24:54 -0400 [thread overview]
Message-ID: <55137C06.9020608@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1503251914260.16714@chino.kir.corp.google.com>
[-- Attachment #1: Type: text/plain, Size: 2017 bytes --]
It's all well and good to say that you shouldn't do that, but it's the
basis of the design in jemalloc and other zone-based arena allocators.
There's a chosen chunk size and chunks are naturally aligned. An
allocation is either a span of chunks (chunk-aligned) or has metadata
stored in the chunk header. This also means chunks can be assigned to
arenas for a high level of concurrency. Thread caching is then only
necessary for batching operations to amortize the cost of locking rather
than to reduce contention. Per-CPU arenas can be implemented quite well
by using sched_getcpu() to move threads around whenever it detects that
another thread allocated from the arena.
With >= 2M chunks, madvise purging works very well at the chunk level
but there's also fine-grained purging within chunks and it completely
breaks down from THP page faults.
The allocator packs memory towards low addresses (address-ordered
best-fit and first-fit can both be done in O(log n) time) so swings in
memory usage will tend to clear large spans of memory which will then
fault in huge pages no matter how it was mapped. Once MADV_FREE can be
used rather than MADV_DONTNEED, this would only happen after memory
pressure... but that's not very comforting.
I don't find it acceptable that programs can have huge (up to ~30% in
real programs) amounts of memory leaked over time due to THP page
faults. This is a very real problem impacting projects like Redis,
MariaDB and Firefox because they all use jemalloc.
https://shk.io/2015/03/22/transparent-huge-pages/
https://www.percona.com/blog/2014/07/23/why-tokudb-hates-transparent-hugepages/
http://dev.nuodb.com/techblog/linux-transparent-huge-pages-jemalloc-and-nuodb
https://bugzilla.mozilla.org/show_bug.cgi?id=770612
Bionic (Android's libc) switched over to jemalloc too.
The only reason you don't hear about this with glibc is because it
doesn't have aggressive, fine-grained purging and a low fragmentation
design in the first place.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2015-03-26 3:25 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-17 21:09 [PATCH] mremap: add MREMAP_NOHOLE flag --resend Shaohua Li
2015-03-18 22:31 ` Andrew Morton
2015-03-19 5:08 ` Shaohua Li
2015-03-19 5:22 ` Andrew Morton
2015-03-19 16:38 ` Shaohua Li
2015-03-19 5:34 ` Daniel Micay
2015-03-22 6:06 ` Aliaksey Kandratsenka
2015-03-22 7:22 ` Daniel Micay
2015-03-24 4:36 ` Aliaksey Kandratsenka
2015-03-24 14:54 ` Daniel Micay
2015-03-25 16:22 ` Vlastimil Babka
2015-03-25 20:49 ` Daniel Micay
2015-03-25 20:54 ` Daniel Micay
2015-03-26 0:19 ` David Rientjes
2015-03-26 0:24 ` Daniel Micay
2015-03-26 2:31 ` David Rientjes
2015-03-26 3:24 ` Daniel Micay [this message]
2015-03-26 3:36 ` Daniel Micay
2015-03-26 17:25 ` Vlastimil Babka
2015-03-26 20:45 ` Daniel Micay
2015-03-23 5:17 ` Shaohua Li
2015-03-24 5:25 ` Aliaksey Kandratsenka
2015-03-24 14:39 ` Daniel Micay
2015-03-25 5:02 ` Shaohua Li
2015-03-26 0:50 ` Minchan Kim
2015-03-26 1:21 ` Daniel Micay
2015-03-26 7:02 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55137C06.9020608@gmail.com \
--to=danielmicay@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alkondratenko@gmail.com \
--cc=google-perftools@googlegroups.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=shli@fb.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).