linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: James Houghton <jthoughton@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	 Peter Xu <peterx@redhat.com>
Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs
Date: Tue, 14 Mar 2023 08:37:58 -0700	[thread overview]
Message-ID: <CADrL8HWyMvu9DaF9ci2v0eNabj_RsXEcRGKeFOSV_+VQjK3q=g@mail.gmail.com> (raw)
In-Reply-To: <20230306191944.GA15773@monkey>

On Mon, Mar 6, 2023 at 11:19 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> This is past the deadline, so feel free to ignore.  However, ...
>
> James Houghton has been working on the concept of HugeTLB High Granularity
> Mapping (HGM) as discussed here:
> https://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@google.com/
>
> The primary motivation for this work is post-copy live migration of VMs backed
> by hugetlb pages via userfaultfd.  A followup use case is more gracefully
> handling memory errors/poison on hugetlb pages.
>
> As can be seen by the size of James's patch set, the required changes for
> HGM are a bit complex and involved.  This is also complicated the need
> choosing a 'mapcount strategy' as the previous scheme used by hugetlb
> will no longer work.
>
> A HGM for hugetlbfs session would present the current approach and challenges.
> While much of the work is confined to hugetlb, there is a bit spill over to
> other mm areas: specifically page table walking.  A discussion on ways to
> move forward with this effort would be appreciated.

Thanks for proposing this, Mike.

To hopefully get more interest in this topic, I want to lay out the
reasons that Google uses HugeTLB for VMs today. They are:
- Guaranteed availability of hugepages
- Guaranteed NUMA alignment
- Availability of 1G pages
- HugeTLB vmemmap optimization to save page struct overhead

Until generic mm supports all this, HugeTLB will remain a very
important piece of Linux for us. :)

The main limitation of HugeTLB that I care about is that it can only
map an entire hugepage at once; it can never partially map a hugepage
(like, there is no such thing as a PTE-mapped HugeTLB page). As Mike
said, this makes the following applications impossible:
1. With userfaultfd-based live migration, being able to fetch and
install memory at PAGE_SIZE.
2. Memory poison at PAGE_SIZE.

HugeTLB high-granularity mapping (HGM) is an effort to make #1 and #2
possible with HugeTLB.

#1 and #2 are already possible with generic mm, so this also begs the
question: Can we merge HugeTLB with generic mm? This would certainly
be much more work than HGM, but it removes all those pesky HugeTLB
special cases (though, we still want all those features that HugeTLB
has).

Coming up with a plan to merge HugeTLB with generic mm would be
challenging, and LSFMM might be a good place to have such a
discussion. Not all of HugeTLB would need to be merged. I think some
of the main special cases that should be removed are:
1. hugetlb_fault (fault/GUP special case)
2. page_vma_mapped_walk's special case
3. hugetlb_entry in pagewalk
4. HugeTLB's rmap/mapcount special cases (already working on this!)

As part of this merge/unification, architectures would need to merge
their hugetlb implementations with their generic mm implementations
(for example, moving any special logic from set_huge_pte_at to
set_pte_at).

These are just some initial thoughts; I'm sure many of you have your
own ideas for this.

A discussion about HGM might serve as a jumping-off point for ideas
for how to enhance the generic mm implementation to make the
unification possible.


- James Houghton


  reply	other threads:[~2023-03-14 15:38 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 19:19 [LSF/MM/BPF TOPIC] HGM for hugetlbfs Mike Kravetz
2023-03-14 15:37 ` James Houghton [this message]
2023-04-12  1:44   ` David Rientjes
2023-05-24 20:26 ` James Houghton
2023-05-26  3:00   ` David Rientjes
     [not found]     ` <20230602172723.GA3941@monkey>
2023-06-06 22:40       ` David Rientjes
2023-06-07  7:38         ` David Hildenbrand
2023-06-07  7:51           ` Yosry Ahmed
2023-06-07  8:13             ` David Hildenbrand
2023-06-07 22:06               ` Mike Kravetz
2023-06-08  0:02                 ` David Rientjes
2023-06-08  6:34                   ` David Hildenbrand
2023-06-08 18:50                     ` Yang Shi
2023-06-08 21:23                       ` Mike Kravetz
2023-06-09  1:57                         ` Zi Yan
2023-06-09 15:17                           ` Pasha Tatashin
2023-06-09 19:04                             ` Ankur Arora
2023-06-09 19:57                           ` Matthew Wilcox
2023-06-08 20:10                     ` Matthew Wilcox
2023-06-09  2:59                       ` David Rientjes
2023-06-13 14:59                       ` Jason Gunthorpe
2023-06-13 15:15                         ` David Hildenbrand
2023-06-13 15:45                           ` Peter Xu
2023-06-08 21:54                 ` [Lsf-pc] " Dan Williams
2023-06-08 22:35                   ` Mike Kravetz
2023-06-09  3:36                     ` Dan Williams
2023-06-09 20:20                       ` James Houghton
2023-06-13 15:17                         ` Jason Gunthorpe
2023-06-07 14:40           ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADrL8HWyMvu9DaF9ci2v0eNabj_RsXEcRGKeFOSV_+VQjK3q=g@mail.gmail.com' \
    --to=jthoughton@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).