From: Mel Gorman <mgorman@techsingularity.net>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
aarcange@redhat.com, aneesh.kumar@linux.ibm.com,
akpm@linux-foundation.org, jglisse@redhat.com,
khandual@linux.vnet.ibm.com, kirill.shutemov@linux.intel.com,
mhocko@kernel.org, minchan@kernel.org, peterz@infradead.org,
rientjes@google.com, vbabka@suse.cz, willy@infradead.org,
ying.huang@intel.com, nitingupta910@gmail.com
Subject: Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory
Date: Fri, 9 Nov 2018 13:11:28 +0000 [thread overview]
Message-ID: <20181109131128.GE23260@techsingularity.net> (raw)
In-Reply-To: <20181109121318.3f3ou56ceegrqhcp@kshutemo-mobl1>
On Fri, Nov 09, 2018 at 03:13:18PM +0300, Kirill A. Shutemov wrote:
> On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote:
> > The basic idea as outlined by Mel Gorman in [2] is:
> >
> > 1) On first fault in a sufficiently sized range, allocate a huge page
> > sized and aligned block of base pages. Map the base page
> > corresponding to the fault address and hold the rest of the pages in
> > reserve.
> > 2) On subsequent faults in the range, map the pages from the reservation.
> > 3) When enough pages have been mapped, promote the mapped pages and
> > remaining pages in the reservation to a huge page.
> > 4) When there is memory pressure, release the unused pages from their
> > reservations.
>
> I haven't yet read the patch in details, but I'm skeptical about the
> approach in general for few reasons:
>
> - PTE page table retracting to replace it with huge PMD entry requires
> down_write(mmap_sem). It makes the approach not practical for many
> multi-threaded workloads.
>
> I don't see a way to avoid exclusive lock here. I will be glad to
> be proved otherwise.
>
That problem is somewhat fundamental to the mmap_sem itself and
conceivably it could be alleviated by range-locking (if that gets
completed). The other thing to bear in mind is the timing. If the
promotion is in-place due to reservations, there isn't the allocation
overhead and the hold times *should* be short.
> - The promotion will also require TLB flush which might be prohibitively
> slow on big machines.
>
Which may be offset by either a) setting the threshold to 1 in cases
where the promtotion should always be immediate or b) offset by reduced
memory consumption potentially avoiding premature reclaim in others.
> - Short living processes will fail to benefit from THP with the policy,
> even with plenty of free memory in the system: no time to promote to THP
> or, with synchronous promotion, cost will overweight the benefit.
>
Short-lived processes are also not going to be dominated by the TLB
refill cost so I think that's somewhat unfair. Potential means of
mediating this include per-task promotion thresholds via either prctl or
a task-wide policy inherited across exec
> The goal to reduce memory overhead of THP is admirable, but we need to be
> careful not to kill THP benefit itself. The approach will reduce number of
> THP mapped in the system and/or shift their allocation to later stage of
> process lifetime.
>
While I agree with you, I also had suggested in review that the
threshold initially be set to 1 so it can be experiemented with by
people who are more concerned about memory consumption than reduced TLB
misses. While the general idea is not free of problems, I believe they
are fixable rather than fundamental.
> Prove me wrong with performance data. :)
>
Agreed that this should be accompanied by performance data but I think I
laid out a reasonable approach here. If the default is a threshold of 1
and that is shown to be performance-neutral then incremental progress
can be made as opposed to an "all or nothing" approach.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2018-11-09 13:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-09 6:48 [RFC PATCH] mm: thp: implement THP reservations for anonymous memory Anthony Yznaga
2018-11-09 11:07 ` Mel Gorman
2018-11-09 23:37 ` anthony.yznaga
2018-11-09 23:37 ` anthony.yznaga
2018-11-09 12:13 ` Kirill A. Shutemov
2018-11-09 13:11 ` Mel Gorman [this message]
2018-11-09 15:34 ` Zi Yan
2018-11-10 0:39 ` anthony.yznaga
2018-11-10 0:39 ` anthony.yznaga
2018-11-10 9:35 ` Kirill A. Shutemov
2018-11-09 19:51 ` Andrea Arcangeli
2018-11-10 0:55 ` anthony.yznaga
2018-11-10 0:55 ` anthony.yznaga
2018-11-10 13:22 ` Mel Gorman
2018-11-10 16:44 ` Andrea Arcangeli
2018-11-14 23:15 ` anthony.yznaga
2018-11-14 23:15 ` anthony.yznaga
2019-01-25 2:28 ` Anthony Yznaga
2018-11-20 9:11 ` Kirill A. Shutemov
2018-11-20 17:04 ` Andrea Arcangeli
2018-11-10 0:04 ` anthony.yznaga
2018-11-10 0:04 ` anthony.yznaga
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181109131128.GE23260@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=anthony.yznaga@oracle.com \
--cc=jglisse@redhat.com \
--cc=khandual@linux.vnet.ibm.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=nitingupta910@gmail.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.