Re: [PATCH v2] mm: Reduce memory bloat with THP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nitin Gupta <nitin.m.gupta@oracle.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>, Michal Hocko <mhocko@kernel.org>,
	Nitin Gupta <nitingupta910@gmail.com>,
	steven.sistare@oracle.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>, Nadav Amit <namit@vmware.com>,
	Minchan Kim <minchan@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vegard Nossum <vegard.nossum@oracle.com>,
	"Levin, Alexander" <alexander.levin@verizon.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>, Shaohua Li <shli@fb.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
	Dave Jiang <dave.jiang@intel.com>,
	J?r?me Glisse <jglisse@redhat.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Hugh Dickins <hughd@google.com>, Tobin C Harding <me@tobin.cc>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: Reduce memory bloat with THP
Date: Wed, 31 Jan 2018 17:09:48 -0800	[thread overview]
Message-ID: <c8e16ca6-b78d-6066-4d5a-bb6be337c93e@oracle.com> (raw)
In-Reply-To: <20180125211303.rbfeg7ultwr6hpd3@suse.de>



On 01/25/2018 01:13 PM, Mel Gorman wrote:
> On Thu, Jan 25, 2018 at 11:41:03AM -0800, Nitin Gupta wrote:
>>>> It's not really about memory scarcity but a more efficient use of it.
>>>> Applications may want hugepage benefits without requiring any changes to
>>>> app code which is what THP is supposed to provide, while still avoiding
>>>> memory bloat.
>>>>
>>> I read these links and find that there are mainly two complains:
>>> 1. THP causes latency spikes, because direction compaction slows down THP allocation,
>>> 2. THP bloats memory footprint when jemalloc uses MADV_DONTNEED to return memory ranges smaller than
>>>    THP size and fails because of THP.
>>>
>>> The first complain is not related to this patch.
>>
>> I'm trying to address many different THP issues and memory bloat is
>> first among them.
> 
> Expecting userspace to get this right is probably going to go sideways.
> It'll be screwed up and be sub-optimal or have odd semantics for existing
> madvise flags. The fact is that an application may not even know if it's
> going to be sparsely using memory in advance if it's a computation load
> modelling from unknown input data.
> 
> I suggest you read the old Talluri paper "Superpassing the TLB Performance
> of Superpages with Less Operating System Support" and pay attention to
> Section 4. There it discusses a page reservation scheme whereby on fault
> a naturally aligned set of base pages are reserved and only one correctly
> placed base page is inserted into the faulting address. It was tied into
> a hypothetical piece of hardware that doesn't exist to give best-effort
> support for superpages so it does not directly help you but the initial
> idea is sound. There are holes in the paper from todays perspective but
> it was written in the 90's.
> 
> From there, read "Transparent operating system support for superpages"
> by Navarro, particularly chapter 4 paying attention to the parts where
> it talks about opportunism and promotion threshold.
> 
> Superficially, it goes like this
> 
> 1. On fault, reserve a THP in the allocator and use one base page that
>    is correctly-aligned for the faulting addresses. By correctly-aligned,
>    I mean that you use base page whose offset would be naturally contiguous
>    if it ever was part of a huge page.
> 2. On subsequent faults, attempt to use a base page that is naturally
>    aligned to be a THP
> 3. When a "threshold" of base pages are inserted, allocate the remaining
>    pages and promote it to a THP
> 4. If there is memory pressure, spill "reserved" pages into the main
>    allocation pool and lose the opportunity to promote (which will need
>    khugepaged to recover)
> 
> By definition, a promotion threshold of 1 would be the existing scheme
> of allocation a THP on the first fault and some users will want that. It
> also should be the default to avoid unexpected overhead.  For workloads
> where memory is being sparsely addressed and the increased overhead of
> THP is unwelcome then the threshold should be tuned higher with a maximum
> possible value of HPAGE_PMD_NR.
> 
> It's non-trivial to do this because at minimum a page fault has to check
> if there is a potential promotion candidate by checking the PTEs around
> the faulting address searching for a correctly-aligned base page that is
> already inserted. If there is, then check if the correctly aligned base
> page for the current faulting address is free and if so use it. It'll
> also then need to check the remaining PTEs to see if both the promotion
> threshold has been reached and if so, promote it to a THP (or else teach
> khugepaged to do an in-place promotion if possible). In other words,
> implementing the promotion threshold is both hard and it's not free.
> 
> However, if it did exist then the only tunable would be the "promotion
> threshold" and applications would not need any special awareness of their
> address space.
> 

I went through both references you mentioned and I really like the
idea of reservation-based hugepage allocation.  Navarro also extends
the idea to allow multiple hugepage sizes to be used (as support by
underlying hardware) which was next in order of what I wanted to do in
THP.

So, please ignore this patch and I would work towards implementing
ideas in these papers.

Thanks for the feedback.

Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2018-02-01  1:12 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-18 23:33 [PATCH v2] mm: Reduce memory bloat with THP Nitin Gupta
2018-01-19 12:49 ` Michal Hocko
2018-01-19 20:59   ` Nitin Gupta
2018-01-25  0:47     ` Zi Yan
2018-01-25 19:41       ` Nitin Gupta
2018-01-25 21:13         ` Mel Gorman
2018-02-01  1:09           ` Nitin Gupta [this message]
2018-02-01 10:09             ` Mel Gorman
2018-02-01 10:27           ` Kirill A. Shutemov
2018-02-01 10:46             ` Mel Gorman
2018-01-25 22:29         ` Andrea Arcangeli
2018-01-25  9:58     ` Michal Hocko
2018-01-25 22:40       ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8e16ca6-b78d-6066-4d5a-bb6be337c93e@oracle.com \
    --to=nitin.m.gupta@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=dave.jiang@intel.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=me@tobin.cc \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=mingo@kernel.org \
    --cc=namit@vmware.com \
    --cc=nitingupta910@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=shli@fb.com \
    --cc=steven.sistare@oracle.com \
    --cc=vegard.nossum@oracle.com \
    --cc=willy@linux.intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).