All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: James Houghton <jthoughton@google.com>
Cc: Peter Xu <peterx@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>,
	David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Zach O'Keefe <zokeefe@google.com>,
	Manish Mishra <manish.mishra@nutanix.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM
Date: Wed, 21 Dec 2022 16:38:21 -0800	[thread overview]
Message-ID: <Y6Om/dvlt1Wl2uZw@monkey> (raw)
In-Reply-To: <CADrL8HX997CguZWkH3vB4+cYjwLc0mYV4GkroE41bCLRqFiPpg@mail.gmail.com>

On 12/21/22 19:02, James Houghton wrote:
> On Wed, Dec 21, 2022 at 5:32 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 12/21/22 17:10, Peter Xu wrote:
> > > On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote:
> > > > On 12/21/22 15:21, James Houghton wrote:
> > > > > Thanks for bringing this up, Peter. I think the main reason was:
> > > > > having separate UFFD_FEATUREs clearly indicates to userspace what is
> > > > > and is not supported.
> > > >
> > > > IIRC, I think we wanted to initially limit the usage to the very
> > > > specific use case (live migration).  The idea is that we could then
> > > > expand usage as more use cases came to light.
> > > >
> > > > Another good thing is that userfaultfd has versioning built into the
> > > > API.  Thus a user can determine if HGM is enabled in their running
> > > > kernel.
> > >
> > > I don't worry much on this one, afaiu if we have any way to enable hgm then
> > > the user can just try enabling it on a test vma, just like when an app
> > > wants to detect whether a new madvise() is present on the current host OS.
> 
> That would be enough to test if HGM was merely present, but if
> specific features like 4K UFFDIO_CONTINUEs or 4K UFFDIO_WRITEPROTECTs
> were available. You could always check these by making a HugeTLB VMA
> and setting it up correctly for userfaultfd/etc., but that's a little
> messy.
> 
> > >
> > > Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm
> > > would work too.
> 
> I'm not opposed to this.
> 
> > >
> > > >
> > > > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller
> > > > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't
> > > > > allowed as of this patch series, but it could be allowed in the
> > > > > future. To add support in the same way as this series, we would add
> > > > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that
> > > > > having to add another feature isn't great; is this what you're
> > > > > concerned about?
> > > > >
> > > > > Considering MADV_ENABLE_HUGETLB...
> > > > > 1. If a user provides this, then the contract becomes: "the kernel may
> > > > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at
> > > > > high-granularities, provided the support exists", but it becomes
> > > > > unclear to userspace to know what's supported and what isn't.
> > > > > 2. We would then need to keep track if a user explicitly enabled it,
> > > > > or if it got enabled automatically in response to memory poison, for
> > > > > example. Not a big problem, just a complication. (Otherwise, if HGM
> > > > > got enabled for poison, suddenly userspace would be allowed to do
> > > > > things it wasn't allowed to do before.)
> > >
> > > We could alternatively have two flags for each vma: (a) hgm_advised and (b)
> > > hgm_enabled.  (a) always sets (b) but not vice versa.  We can limit poison
> > > to set (b) only.  For this patchset, it can be all about (a).
> 
> My thoughts exactly. :)
> 
> > >
> > > > > 3. This API makes sense for enabling HGM for something outside of
> > > > > userfaultfd, like MADV_DONTNEED.
> > > >
> > > > I think #3 is key here.  Once we start applying HGM to things outside
> > > > userfaultfd, then more thought will be required on APIs.  The API is
> > > > somewhat limited by design until the basic functionality is in place.
> > >
> > > Mike, could you elaborate what's the major concern of having hgm used
> > > outside uffd and live migration use cases?
> > >
> > > I feel like I miss something here.  I can understand we want to limit the
> > > usage only when the user specifies using hgm because we want to keep the
> > > old behavior intact.  However if we want another way to enable hgm it'll
> > > still need one knob anyway even outside uffd, and I thought that'll service
> > > the same purpose, or maybe not?
> >
> > I am not opposed to using hgm outside the use cases targeted by this series.
> >
> > It seems that when we were previously discussing the API we spent a bunch of
> > time going around in circles trying to get the API correct.  That is expected
> > as it is more difficult to take all users/uses/abuses of the API into account.
> >
> > Since the initial use case was fairly limited, it seemed like a good idea to
> > limit the API to userfaultfd.  In this way we could focus on the underlying
> > code/implementation and then expand as needed.  Of course, with an eye on
> > anything that may be a limiting factor in the future.
> >
> > I was not aware of the uffd-wp use case, and am more than happy to discuss
> > expanding the API.
> 
> So considering two API choices:
> 
> 1. What we have now: UFFD_FEATURE_MINOR_HUGETLBFS_HGM for
> UFFDIO_CONTINUE, and later UFFD_FEATURE_WP_HUGETLBFS_HGM for
> UFFDIO_WRITEPROTECT. For MADV_DONTNEED, we could just suddenly start
> allowing high-granularity choices (not sure if this is bad; we started
> allowing it for HugeTLB recently with no other API change, AFAIA).

I don't think we can just start allowing HGM for MADV_DONTNEED without
some type of user interaction/request.  Otherwise, a user that passes
in non-hugetlb page size requests may get unexpected results.  And, one
of the threads about MADV_DONTNEED points out a valid use cases where
the caller may not know the mapping is hugetlb or not and is likely to
pass in non-hugetlb page size requests.

> 2. MADV_ENABLE_HGM or something similar. The changes to
> UFFDIO_CONTINUE/UFFDIO_WRITEPROTECT/MADV_DONTNEED come automatically,
> provided they are implemented.
> 
> I don't mind one way or the other. Peter, I assume you prefer #2.
> Mike, what about you? If we decide on something other than #1, I'll
> make the change before sending v1 out.

Since I do not believe 1) is an option, MADV_ENABLE_HGM might be the way
to go.  Any thoughts about MADV_ENABLE_HGM?  I'm thinking:
- Make it have same restrictions as other madvise hugetlb calls,
  . addr must be huge page aligned
  . length is rounded down to a multiple of huge page size
- We split the vma as required
- Flags carrying HGM state reside in the hugetlb_shared_vma_data struct

-- 
Mike Kravetz


  reply	other threads:[~2022-12-22  0:39 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-21 16:36 [RFC PATCH v2 00/47] hugetlb: introduce HugeTLB high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE James Houghton
2022-11-16 16:30   ` Peter Xu
2022-11-21 18:33     ` James Houghton
2022-12-08 22:55       ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 02/47] hugetlb: remove mk_huge_pte; it is unused James Houghton
2022-11-16 16:35   ` Peter Xu
2022-12-07 23:13   ` Mina Almasry
2022-12-08 23:42   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 03/47] hugetlb: remove redundant pte_mkhuge in migration path James Houghton
2022-11-16 16:36   ` Peter Xu
2022-12-07 23:16   ` Mina Almasry
2022-12-09  0:10   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 04/47] hugetlb: only adjust address ranges when VMAs want PMD sharing James Houghton
2022-11-16 16:50   ` Peter Xu
2022-12-09  0:22   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason James Houghton
2022-11-16 17:08   ` Peter Xu
2022-11-21 18:11     ` James Houghton
2022-12-07 23:33   ` Mina Almasry
2022-12-09 22:36   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 06/47] hugetlb: extend vma lock for shared vmas James Houghton
2022-11-30 21:01   ` Peter Xu
2022-11-30 23:29     ` James Houghton
2022-12-09 22:48     ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 07/47] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-12-09 22:52   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions James Houghton
2022-11-16 17:19   ` Peter Xu
2022-12-08  0:26   ` Mina Almasry
2022-12-09 15:41     ` James Houghton
2022-12-13  0:13   ` Mike Kravetz
2022-12-13 15:49     ` James Houghton
2022-12-15 17:51       ` Mike Kravetz
2022-12-15 18:08         ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 09/47] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-12-08  0:30   ` Mina Almasry
2022-12-13  0:25   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-11-16 22:17   ` Peter Xu
2022-11-17  1:00     ` James Houghton
2022-11-17 16:27       ` Peter Xu
2022-12-08  0:46   ` Mina Almasry
2022-12-09 16:02     ` James Houghton
2022-12-13 18:44       ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc James Houghton
2022-12-13 19:32   ` Mike Kravetz
2022-12-13 20:18     ` James Houghton
2022-12-14  0:04       ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step James Houghton
2022-11-16 22:02   ` Peter Xu
2022-11-17  1:39     ` James Houghton
2022-12-14  0:47   ` Mike Kravetz
2023-01-05  0:57   ` Jane Chu
2023-01-05  1:12     ` Jane Chu
2023-01-05  1:23     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 13/47] hugetlb: add make_huge_pte_with_shift James Houghton
2022-12-14  1:08   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 14/47] hugetlb: make default arch_make_huge_pte understand small mappings James Houghton
2022-12-14 22:17   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 15/47] hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped James Houghton
2022-12-14 23:37   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 16/47] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-12-15  0:28   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 17/47] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-12-15 18:15   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 18/47] hugetlb: enlighten follow_hugetlb_page to support HGM James Houghton
2022-12-15 19:29   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 19/47] hugetlb: make hugetlb_follow_page_mask HGM-enabled James Houghton
2022-12-16  0:25   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 20/47] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 21/47] mm: rmap: provide pte_order in page_vma_mapped_walk James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 22/47] mm: rmap: make page_vma_mapped_walk callers use pte_order James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 23/47] rmap: update hugetlb lock comment for HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 24/47] hugetlb: update page_vma_mapped to do high-granularity walks James Houghton
2022-12-15 17:49   ` James Houghton
2022-12-15 18:45     ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 25/47] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-11-30 21:32   ` Peter Xu
2022-11-30 23:18     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 26/47] hugetlb: make move_hugetlb_page_tables compatible with HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 27/47] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 28/47] rmap: in try_to_{migrate,unmap}_one, check head page for page flags James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 29/47] hugetlb: add high-granularity migration support James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 30/47] hugetlb: add high-granularity check for hwpoison in fault path James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 31/47] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 32/47] hugetlb: add for_each_hgm_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-11-16 22:28   ` Peter Xu
2022-11-16 23:30     ` James Houghton
2022-12-21 19:23       ` Peter Xu
2022-12-21 20:21         ` James Houghton
2022-12-21 21:39           ` Mike Kravetz
2022-12-21 22:10             ` Peter Xu
2022-12-21 22:31               ` Mike Kravetz
2022-12-22  0:02                 ` James Houghton
2022-12-22  0:38                   ` Mike Kravetz [this message]
2022-12-22  1:24                     ` James Houghton
2022-12-22 14:30                       ` Peter Xu
2022-12-27 17:02                         ` James Houghton
2023-01-03 17:06                           ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 34/47] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-11-17 16:58   ` Peter Xu
2022-12-23 18:38   ` Peter Xu
2022-12-27 16:38     ` James Houghton
2023-01-03 17:09       ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 35/47] userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM James Houghton
2022-12-22 21:47   ` Peter Xu
2022-12-27 16:39     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 36/47] hugetlb: add MADV_COLLAPSE for hugetlb James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 37/47] hugetlb: remove huge_pte_lock and huge_pte_lockptr James Houghton
2022-11-16 20:16   ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 38/47] hugetlb: replace make_huge_pte with make_huge_pte_with_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 39/47] mm: smaps: add stats for HugeTLB mapping size James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 40/47] hugetlb: x86: enable high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 41/47] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 42/47] docs: proc: include information about HugeTLB HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 43/47] selftests/vm: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 44/47] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 45/47] selftests/vm: add anon and shared hugetlb to migration test James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 46/47] selftests/vm: add hugetlb HGM test to migration selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 47/47] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y6Om/dvlt1Wl2uZw@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=manish.mishra@nutanix.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.