From: Jason Gunthorpe <jgg@ziepe.ca>
To: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Yosry Ahmed <yosryahmed@google.com>,
James Houghton <jthoughton@google.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
Miaohe Lin <linmiaohe@huawei.com>,
lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
Peter Xu <peterx@redhat.com>, Michal Hocko <mhocko@suse.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Jiaqi Yan <jiaqiyan@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs
Date: Tue, 13 Jun 2023 11:59:15 -0300 [thread overview]
Message-ID: <ZIiEQ+cMPGkIcAEN@ziepe.ca> (raw)
In-Reply-To: <ZII1p8ZHlHaQ3dDl@casper.infradead.org>
On Thu, Jun 08, 2023 at 09:10:15PM +0100, Matthew Wilcox wrote:
> On Thu, Jun 08, 2023 at 08:34:10AM +0200, David Hildenbrand wrote:
> > On 08.06.23 02:02, David Rientjes wrote:
> > > While people have proposed 1GB THP support in the past, it was nacked, in
> > > part, because of the suggestion to just use existing 1GB support in
> > > hugetlb instead :)
> >
> > Yes, because I still think that the use for "transparent" (for the user)
> > nowadays is very limited and not worth the complexity.
> >
> > IMHO, what you really want is a pool of large pages that (guarantees about
> > availability and nodes) and fine control about who gets these pages. That's
> > what hugetlb provides.
> >
> > In contrast to THP, you don't want to allow for
> > * Partially mmap, mremap, munmap, mprotect them
> > * Partially sharing then / COW'ing them
> > * Partially mixing them with other anon pages (MADV_DONTNEED + refault)
> > * Exclude them from some features KSM/swap
> > * (swap them out and eventually split them for that)
> >
> > Because you don't want to get these pages PTE-mapped by the system *unless*
> > there is a real reason (HGM, hwpoison) -- you want guarantees. Once such a
> > page is PTE-mapped, you only want to collapse in place.
> >
> > But you don't want special-HGM, you simply want the core to PTE-map them
> > like a (file) THP.
> >
> > IMHO, getting that realized much easier would be if we wouldn't have to care
> > about some of the hugetlb complexity I raised (MAP_PRIVATE, PMD sharing),
> > but maybe there is a way ...
>
> I favour a more evolutionary than revolutionary approach. That is,
> I think it's acceptable to add new features to hugetlbfs _if_ they're
> combined with cleanup work that gets hugetlbfs closer to the main mm.
> This is why I harp on things like pagewalk that currently need special
> handling for hugetlb -- that's pointless; they should just be treated as
> large folios. GUP handles hugetlb separately too, and I'm not sure why.
Yes, this echo's my feelings too.
Making all the special core-mm cases around hugetlb even more
complicated with HGM seems like a non-starter.
We need to get to a point where the core-mm handles all the PTE
programming and supports arbitary order folios in the page tables
uniformly for everyone.
hugetlb is just a special high order folio provider.
Get rid of all the special PTE formats, unique arch code, and special
code in gup.c/pagewalkers/etc that supports hugetlbfs.
I think the general path to do that is to make the core-mm and all the
hugetlb supporting arches support a core-code path for working with
high order folios in page tables.
Maybe this is demo'd & tested with a temporary/simplified hugetlbfs
uAPI. When the core MM and all the arches are ready you switch
hugetlbfs to use the new core API and deleted all the page walk
special cases.
From there you can then teach the core code to do all the splitting
and whatever that you want.
Jason
next prev parent reply other threads:[~2023-06-13 14:59 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-06 19:19 [LSF/MM/BPF TOPIC] HGM for hugetlbfs Mike Kravetz
2023-03-14 15:37 ` James Houghton
2023-04-12 1:44 ` David Rientjes
2023-05-24 20:26 ` James Houghton
2023-05-26 3:00 ` David Rientjes
[not found] ` <20230602172723.GA3941@monkey>
2023-06-06 22:40 ` David Rientjes
2023-06-07 7:38 ` David Hildenbrand
2023-06-07 7:51 ` Yosry Ahmed
2023-06-07 8:13 ` David Hildenbrand
2023-06-07 22:06 ` Mike Kravetz
2023-06-08 0:02 ` David Rientjes
2023-06-08 6:34 ` David Hildenbrand
2023-06-08 18:50 ` Yang Shi
2023-06-08 21:23 ` Mike Kravetz
2023-06-09 1:57 ` Zi Yan
2023-06-09 15:17 ` Pasha Tatashin
2023-06-09 19:04 ` Ankur Arora
2023-06-09 19:57 ` Matthew Wilcox
2023-06-08 20:10 ` Matthew Wilcox
2023-06-09 2:59 ` David Rientjes
2023-06-13 14:59 ` Jason Gunthorpe [this message]
2023-06-13 15:15 ` David Hildenbrand
2023-06-13 15:45 ` Peter Xu
2023-06-08 21:54 ` [Lsf-pc] " Dan Williams
2023-06-08 22:35 ` Mike Kravetz
2023-06-09 3:36 ` Dan Williams
2023-06-09 20:20 ` James Houghton
2023-06-13 15:17 ` Jason Gunthorpe
2023-06-07 14:40 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZIiEQ+cMPGkIcAEN@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=jiaqiyan@google.com \
--cc=jthoughton@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@nec.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).