All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: James Houghton <jthoughton@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>,
	Peter Xu <peterx@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Zach O'Keefe <zokeefe@google.com>,
	Manish Mishra <manish.mishra@nutanix.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step
Date: Tue, 13 Dec 2022 16:47:33 -0800	[thread overview]
Message-ID: <Y5kdJQgrPRwvw1VJ@monkey> (raw)
In-Reply-To: <20221021163703.3218176-13-jthoughton@google.com>

On 10/21/22 16:36, James Houghton wrote:
> hugetlb_hgm_walk implements high-granularity page table walks for
> HugeTLB. It is safe to call on non-HGM enabled VMAs; it will return
> immediately.
> 
> hugetlb_walk_step implements how we step forwards in the walk. For
> architectures that don't use GENERAL_HUGETLB, they will need to provide
> their own implementation.
> 
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/hugetlb.h |  13 +++++
>  mm/hugetlb.c            | 125 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 138 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 003255b0e40f..4b1548adecde 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -276,6 +276,10 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
>  pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  		      unsigned long addr, pud_t *pud);
>  
> +int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma,
> +		     struct hugetlb_pte *hpte, unsigned long addr,
> +		     unsigned long sz, bool stop_at_none);
> +
>  struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage);
>  
>  extern int sysctl_hugetlb_shm_group;
> @@ -288,6 +292,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
>  pte_t *huge_pte_offset(struct mm_struct *mm,
>  		       unsigned long addr, unsigned long sz);
>  unsigned long hugetlb_mask_last_page(struct hstate *h);
> +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		      unsigned long addr, unsigned long sz);
>  int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
>  				unsigned long addr, pte_t *ptep);
>  void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
> @@ -1066,6 +1072,8 @@ void hugetlb_register_node(struct node *node);
>  void hugetlb_unregister_node(struct node *node);
>  #endif
>  
> +enum hugetlb_level hpage_size_to_level(unsigned long sz);
> +
>  #else	/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  
> @@ -1253,6 +1261,11 @@ static inline void hugetlb_register_node(struct node *node)
>  static inline void hugetlb_unregister_node(struct node *node)
>  {
>  }
> +
> +static inline enum hugetlb_level hpage_size_to_level(unsigned long sz)
> +{
> +	return HUGETLB_LEVEL_PTE;
> +}
>  #endif	/* CONFIG_HUGETLB_PAGE */
>  
>  #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e3733388adee..90db59632559 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -95,6 +95,29 @@ static void hugetlb_vma_data_free(struct vm_area_struct *vma);
>  static int hugetlb_vma_data_alloc(struct vm_area_struct *vma);
>  static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma);
>  
> +/*
> + * hpage_size_to_level() - convert @sz to the corresponding page table level
> + *
> + * @sz must be less than or equal to a valid hugepage size.
> + */
> +enum hugetlb_level hpage_size_to_level(unsigned long sz)
> +{
> +	/*
> +	 * We order the conditionals from smallest to largest to pick the
> +	 * smallest level when multiple levels have the same size (i.e.,
> +	 * when levels are folded).
> +	 */
> +	if (sz < PMD_SIZE)
> +		return HUGETLB_LEVEL_PTE;
> +	if (sz < PUD_SIZE)
> +		return HUGETLB_LEVEL_PMD;
> +	if (sz < P4D_SIZE)
> +		return HUGETLB_LEVEL_PUD;
> +	if (sz < PGDIR_SIZE)
> +		return HUGETLB_LEVEL_P4D;
> +	return HUGETLB_LEVEL_PGD;
> +}
> +
>  static inline bool subpool_is_free(struct hugepage_subpool *spool)
>  {
>  	if (spool->count)
> @@ -7321,6 +7344,70 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr)
>  }
>  #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */
>  
> +/* hugetlb_hgm_walk - walks a high-granularity HugeTLB page table to resolve
> + * the page table entry for @addr.
> + *
> + * @hpte must always be pointing at an hstate-level PTE (or deeper).
> + *
> + * This function will never walk further if it encounters a PTE of a size
> + * less than or equal to @sz.
> + *
> + * @stop_at_none determines what we do when we encounter an empty PTE. If true,
> + * we return that PTE. If false and @sz is less than the current PTE's size,
> + * we make that PTE point to the next level down, going until @sz is the same
> + * as our current PTE.

I was a bit confused about 'we return that PTE' when the function is of type
int.  TBH, I am not a fan of the current scheme of passing in *hpte and having
the hpte modified by the function.

> + *
> + * If @stop_at_none is true and @sz is PAGE_SIZE, this function will always
> + * succeed, but that does not guarantee that hugetlb_pte_size(hpte) is @sz.
> + *
> + * Return:
> + *	-ENOMEM if we couldn't allocate new PTEs.
> + *	-EEXIST if the caller wanted to walk further than a migration PTE,
> + *		poison PTE, or a PTE marker. The caller needs to manually deal
> + *		with this scenario.
> + *	-EINVAL if called with invalid arguments (@sz invalid, @hpte not
> + *		initialized).
> + *	0 otherwise.
> + *
> + *	Even if this function fails, @hpte is guaranteed to always remain
> + *	valid.
> + */
> +int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma,
> +		     struct hugetlb_pte *hpte, unsigned long addr,
> +		     unsigned long sz, bool stop_at_none)

Since we are potentially populating lower level page tables, we may want a
different function name.  It may just be me, but I think of walk as a read
only operation.  I would suggest putting populate in the name, but as Peter
pointed out elsewhere, that has other implications.  Sorry, I can not think
of something better right now.

-- 
Mike Kravetz

> +{
> +	int ret = 0;
> +	pte_t pte;
> +
> +	if (WARN_ON_ONCE(sz < PAGE_SIZE))
> +		return -EINVAL;
> +
> +	if (!hugetlb_hgm_enabled(vma)) {
> +		if (stop_at_none)
> +			return 0;
> +		return sz == huge_page_size(hstate_vma(vma)) ? 0 : -EINVAL;
> +	}
> +
> +	hugetlb_vma_assert_locked(vma);
> +
> +	if (WARN_ON_ONCE(!hpte->ptep))
> +		return -EINVAL;
> +
> +	while (hugetlb_pte_size(hpte) > sz && !ret) {
> +		pte = huge_ptep_get(hpte->ptep);
> +		if (!pte_present(pte)) {
> +			if (stop_at_none)
> +				return 0;
> +			if (unlikely(!huge_pte_none(pte)))
> +				return -EEXIST;
> +		} else if (hugetlb_pte_present_leaf(hpte, pte))
> +			return 0;
> +		ret = hugetlb_walk_step(mm, hpte, addr, sz);
> +	}
> +
> +	return ret;
> +}
> +
>  #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>  pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
>  			unsigned long addr, unsigned long sz)
> @@ -7388,6 +7475,44 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>  	return (pte_t *)pmd;
>  }
>  
> +/*
> + * hugetlb_walk_step() - Walk the page table one step to resolve the page
> + * (hugepage or subpage) entry at address @addr.
> + *
> + * @sz always points at the final target PTE size (e.g. PAGE_SIZE for the
> + * lowest level PTE).
> + *
> + * @hpte will always remain valid, even if this function fails.
> + */
> +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		      unsigned long addr, unsigned long sz)
> +{
> +	pte_t *ptep;
> +	spinlock_t *ptl;
> +
> +	switch (hpte->level) {
> +	case HUGETLB_LEVEL_PUD:
> +		ptep = (pte_t *)hugetlb_pmd_alloc(mm, hpte, addr);
> +		if (IS_ERR(ptep))
> +			return PTR_ERR(ptep);
> +		hugetlb_pte_populate(hpte, ptep, PMD_SHIFT, HUGETLB_LEVEL_PMD);
> +		break;
> +	case HUGETLB_LEVEL_PMD:
> +		ptep = hugetlb_pte_alloc(mm, hpte, addr);
> +		if (IS_ERR(ptep))
> +			return PTR_ERR(ptep);
> +		ptl = pte_lockptr(mm, (pmd_t *)hpte->ptep);
> +		hugetlb_pte_populate(hpte, ptep, PAGE_SHIFT, HUGETLB_LEVEL_PTE);
> +		hpte->ptl = ptl;
> +		break;
> +	default:
> +		WARN_ONCE(1, "%s: got invalid level: %d (shift: %d)\n",
> +				__func__, hpte->level, hpte->shift);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * Return a mask that can be used to update an address to the last huge
>   * page in a page table page mapping size.  Used to skip non-present
> -- 
> 2.38.0.135.g90850a2211-goog
> 


  parent reply	other threads:[~2022-12-14  0:48 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-21 16:36 [RFC PATCH v2 00/47] hugetlb: introduce HugeTLB high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE James Houghton
2022-11-16 16:30   ` Peter Xu
2022-11-21 18:33     ` James Houghton
2022-12-08 22:55       ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 02/47] hugetlb: remove mk_huge_pte; it is unused James Houghton
2022-11-16 16:35   ` Peter Xu
2022-12-07 23:13   ` Mina Almasry
2022-12-08 23:42   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 03/47] hugetlb: remove redundant pte_mkhuge in migration path James Houghton
2022-11-16 16:36   ` Peter Xu
2022-12-07 23:16   ` Mina Almasry
2022-12-09  0:10   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 04/47] hugetlb: only adjust address ranges when VMAs want PMD sharing James Houghton
2022-11-16 16:50   ` Peter Xu
2022-12-09  0:22   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason James Houghton
2022-11-16 17:08   ` Peter Xu
2022-11-21 18:11     ` James Houghton
2022-12-07 23:33   ` Mina Almasry
2022-12-09 22:36   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 06/47] hugetlb: extend vma lock for shared vmas James Houghton
2022-11-30 21:01   ` Peter Xu
2022-11-30 23:29     ` James Houghton
2022-12-09 22:48     ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 07/47] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-12-09 22:52   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions James Houghton
2022-11-16 17:19   ` Peter Xu
2022-12-08  0:26   ` Mina Almasry
2022-12-09 15:41     ` James Houghton
2022-12-13  0:13   ` Mike Kravetz
2022-12-13 15:49     ` James Houghton
2022-12-15 17:51       ` Mike Kravetz
2022-12-15 18:08         ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 09/47] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-12-08  0:30   ` Mina Almasry
2022-12-13  0:25   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-11-16 22:17   ` Peter Xu
2022-11-17  1:00     ` James Houghton
2022-11-17 16:27       ` Peter Xu
2022-12-08  0:46   ` Mina Almasry
2022-12-09 16:02     ` James Houghton
2022-12-13 18:44       ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc James Houghton
2022-12-13 19:32   ` Mike Kravetz
2022-12-13 20:18     ` James Houghton
2022-12-14  0:04       ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step James Houghton
2022-11-16 22:02   ` Peter Xu
2022-11-17  1:39     ` James Houghton
2022-12-14  0:47   ` Mike Kravetz [this message]
2023-01-05  0:57   ` Jane Chu
2023-01-05  1:12     ` Jane Chu
2023-01-05  1:23     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 13/47] hugetlb: add make_huge_pte_with_shift James Houghton
2022-12-14  1:08   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 14/47] hugetlb: make default arch_make_huge_pte understand small mappings James Houghton
2022-12-14 22:17   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 15/47] hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped James Houghton
2022-12-14 23:37   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 16/47] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-12-15  0:28   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 17/47] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-12-15 18:15   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 18/47] hugetlb: enlighten follow_hugetlb_page to support HGM James Houghton
2022-12-15 19:29   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 19/47] hugetlb: make hugetlb_follow_page_mask HGM-enabled James Houghton
2022-12-16  0:25   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 20/47] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 21/47] mm: rmap: provide pte_order in page_vma_mapped_walk James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 22/47] mm: rmap: make page_vma_mapped_walk callers use pte_order James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 23/47] rmap: update hugetlb lock comment for HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 24/47] hugetlb: update page_vma_mapped to do high-granularity walks James Houghton
2022-12-15 17:49   ` James Houghton
2022-12-15 18:45     ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 25/47] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-11-30 21:32   ` Peter Xu
2022-11-30 23:18     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 26/47] hugetlb: make move_hugetlb_page_tables compatible with HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 27/47] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 28/47] rmap: in try_to_{migrate,unmap}_one, check head page for page flags James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 29/47] hugetlb: add high-granularity migration support James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 30/47] hugetlb: add high-granularity check for hwpoison in fault path James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 31/47] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 32/47] hugetlb: add for_each_hgm_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-11-16 22:28   ` Peter Xu
2022-11-16 23:30     ` James Houghton
2022-12-21 19:23       ` Peter Xu
2022-12-21 20:21         ` James Houghton
2022-12-21 21:39           ` Mike Kravetz
2022-12-21 22:10             ` Peter Xu
2022-12-21 22:31               ` Mike Kravetz
2022-12-22  0:02                 ` James Houghton
2022-12-22  0:38                   ` Mike Kravetz
2022-12-22  1:24                     ` James Houghton
2022-12-22 14:30                       ` Peter Xu
2022-12-27 17:02                         ` James Houghton
2023-01-03 17:06                           ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 34/47] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-11-17 16:58   ` Peter Xu
2022-12-23 18:38   ` Peter Xu
2022-12-27 16:38     ` James Houghton
2023-01-03 17:09       ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 35/47] userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM James Houghton
2022-12-22 21:47   ` Peter Xu
2022-12-27 16:39     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 36/47] hugetlb: add MADV_COLLAPSE for hugetlb James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 37/47] hugetlb: remove huge_pte_lock and huge_pte_lockptr James Houghton
2022-11-16 20:16   ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 38/47] hugetlb: replace make_huge_pte with make_huge_pte_with_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 39/47] mm: smaps: add stats for HugeTLB mapping size James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 40/47] hugetlb: x86: enable high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 41/47] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 42/47] docs: proc: include information about HugeTLB HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 43/47] selftests/vm: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 44/47] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 45/47] selftests/vm: add anon and shared hugetlb to migration test James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 46/47] selftests/vm: add hugetlb HGM test to migration selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 47/47] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5kdJQgrPRwvw1VJ@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=manish.mishra@nutanix.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.