linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [LSF/MM ATTEND] Huge Page Futures
Date: Thu, 28 Jan 2016 09:21:22 +0000	[thread overview]
Message-ID: <20160128092122.GH3104@suse.de> (raw)
In-Reply-To: <56A90345.3020903@oracle.com>

On Wed, Jan 27, 2016 at 09:49:57AM -0800, Mike Kravetz wrote:
> On 01/25/2016 05:50 AM, Mike Kravetz wrote:
> >> Do you have any thoughts how it's going to be implemented? It would be
> >> nice to have some design overview or better proof-of-concept patch before
> >> the summit to be able analyze implications for the kernel.
> >>
> > 
> > Good to know the hugetlbfs implementation is considered a hack.  I just
> > started looking at this, and was going to use hugetlbfs as a starting
> > point.  I'll reconsider that decision.
> 
> Kirill, can you (or others) explain your reasons for saying the hugetlbfs
> implementation is an ugly hack?  I do not have enough history/experience
> with this to say what is most offensive.  I would be happy to start by
> cleaning up issues with the current implementation.
> 

Historically, it was considered a hack because it had special handling in
a number of paths in the VM. Of course THP also has similar handling now
so it's less of a concern but there are differences that cause base pages,
transparent hugepages and hugetlbfs pages to all be special cases. That
does not sit comfortably with everyone.

For a long time, it was considered ugly because a fault on private child
mappings was so unreliable and a fork could cause a parent to unexpectedly
fail a fault and die. These days it's different as only the child can die
so while it's less of a concern, hugetlbfs pages allow a child to be killed
if enough huge pages are not available.

It was also considered ugly because application-awareness was required in
so many cases. Granted, libhugetlbfs can hide some of that ugliness but
even that was considered hacky.

The fact that hugetlbfs pages cannot be swapped even without mlock is
another fact that makes them different to the rest of the VM. It has its
own reservation scheme that is different to everything else.

One that crippled it to some extent with the label was the fact that fixing
swap on it was effectively impossible because of power. Once huge pages
had been installed on that architecture for a lont time, it was impossible
to remap them at a different size. The limitation has been relaxed to some
extent but those around long enough remember it.

So it is a bit of a hack that behaves differently to other page types.
It's fairly complex and while the semantics used to be a lot uglier than
it is now, the "ugly hack" label has stuck.

> If we do shared page tables for DAX, it makes sense that it and hugetlbfs
> should be similar (or common) if possible.
> 

It's been a long time since I looked at shared page tables so I can't
remember why but it was a difficult area. A few years were spent on it so
if shared page tables are being considered, I would make damn sure first
that they actually help on modern hardware before jumping into that hole.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-01-28  9:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-25  1:57 [LSF/MM ATTEND] Huge Page Futures Mike Kravetz
2016-01-25 11:01 ` Kirill A. Shutemov
2016-01-25 13:50   ` Mike Kravetz
2016-01-27 17:49     ` Mike Kravetz
2016-01-28  8:49       ` Hugh Dickins
2016-01-28 19:06         ` Mike Kravetz
2016-01-28  9:21       ` Mel Gorman [this message]
2016-01-28 18:24         ` [Lsf-pc] " Mike Kravetz
2016-01-28 15:05 ` Aneesh Kumar K.V
2016-01-28 19:28   ` Mike Kravetz
2016-01-29 10:01     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160128092122.GH3104@suse.de \
    --to=mgorman@suse.de \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mike.kravetz@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).