All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: James Houghton <jthoughton@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>,
	Peter Xu <peterx@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Zach O'Keefe <zokeefe@google.com>,
	Manish Mishra <manish.mishra@nutanix.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc
Date: Tue, 13 Dec 2022 11:32:29 -0800	[thread overview]
Message-ID: <Y5jTTYwzFHgmm5l6@monkey> (raw)
In-Reply-To: <20221021163703.3218176-12-jthoughton@google.com>

On 10/21/22 16:36, James Houghton wrote:
> These functions are used to allocate new PTEs below the hstate PTE. This
> will be used by hugetlb_walk_step, which implements stepping forwards in
> a HugeTLB high-granularity page table walk.
> 
> The reasons that we don't use the standard pmd_alloc/pte_alloc*
> functions are:
>  1) This prevents us from accidentally overwriting swap entries or
>     attempting to use swap entries as present non-leaf PTEs (see
>     pmd_alloc(); we assume that !pte_none means pte_present and
>     non-leaf).
>  2) Locking hugetlb PTEs can different than regular PTEs. (Although, as
>     implemented right now, locking is the same.)
>  3) We can maintain compatibility with CONFIG_HIGHPTE. That is, HugeTLB
>     HGM won't use HIGHPTE, but the kernel can still be built with it,
>     and other mm code will use it.
> 
> When GENERAL_HUGETLB supports P4D-based hugepages, we will need to
> implement hugetlb_pud_alloc to implement hugetlb_walk_step.
> 
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
>  include/linux/hugetlb.h |  5 +++
>  mm/hugetlb.c            | 94 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 99 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index d30322108b34..003255b0e40f 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -119,6 +119,11 @@ void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
>  
>  bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte);
>  
> +pmd_t *hugetlb_pmd_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		unsigned long addr);
> +pte_t *hugetlb_pte_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		unsigned long addr);
> +
>  struct hugepage_subpool {
>  	spinlock_t lock;
>  	long count;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a0e46d35dabc..e3733388adee 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -341,6 +341,100 @@ static bool has_same_uncharge_info(struct file_region *rg,
>  #endif
>  }
>  
> +pmd_t *hugetlb_pmd_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		unsigned long addr)

A little confused as there are no users yet ... Is hpte the 'hstate PTE'
that we are trying to allocate ptes under?  For example, in the case of
a hugetlb_pmd_alloc caller hpte would be a PUD or CONT_PMD size pte?

> +{
> +	spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> +	pmd_t *new;
> +	pud_t *pudp;
> +	pud_t pud;
> +
> +	if (hpte->level != HUGETLB_LEVEL_PUD)
> +		return ERR_PTR(-EINVAL);

Ah yes, it is PUD level.  However, I guess CONT_PMD would also be valid
on arm64?

> +
> +	pudp = (pud_t *)hpte->ptep;
> +retry:
> +	pud = *pudp;

We might want to consider a READ_ONCE here.  I am not an expert on such
things, but recall a similar as pointed out in the now obsolete commit
27ceae9833843.

-- 
Mike Kravetz

> +	if (likely(pud_present(pud)))
> +		return unlikely(pud_leaf(pud))
> +			? ERR_PTR(-EEXIST)
> +			: pmd_offset(pudp, addr);
> +	else if (!huge_pte_none(huge_ptep_get(hpte->ptep)))
> +		/*
> +		 * Not present and not none means that a swap entry lives here,
> +		 * and we can't get rid of it.
> +		 */
> +		return ERR_PTR(-EEXIST);
> +
> +	new = pmd_alloc_one(mm, addr);
> +	if (!new)
> +		return ERR_PTR(-ENOMEM);
> +
> +	spin_lock(ptl);
> +	if (!pud_same(pud, *pudp)) {
> +		spin_unlock(ptl);
> +		pmd_free(mm, new);
> +		goto retry;
> +	}
> +
> +	mm_inc_nr_pmds(mm);
> +	smp_wmb(); /* See comment in pmd_install() */
> +	pud_populate(mm, pudp, new);
> +	spin_unlock(ptl);
> +	return pmd_offset(pudp, addr);
> +}
> +
> +pte_t *hugetlb_pte_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte,
> +		unsigned long addr)
> +{
> +	spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> +	pgtable_t new;
> +	pmd_t *pmdp;
> +	pmd_t pmd;
> +
> +	if (hpte->level != HUGETLB_LEVEL_PMD)
> +		return ERR_PTR(-EINVAL);
> +
> +	pmdp = (pmd_t *)hpte->ptep;
> +retry:
> +	pmd = *pmdp;
> +	if (likely(pmd_present(pmd)))
> +		return unlikely(pmd_leaf(pmd))
> +			? ERR_PTR(-EEXIST)
> +			: pte_offset_kernel(pmdp, addr);
> +	else if (!huge_pte_none(huge_ptep_get(hpte->ptep)))
> +		/*
> +		 * Not present and not none means that a swap entry lives here,
> +		 * and we can't get rid of it.
> +		 */
> +		return ERR_PTR(-EEXIST);
> +
> +	/*
> +	 * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result
> +	 * in page tables being allocated in high memory, needing a kmap to
> +	 * access. Instead, we call __pte_alloc_one directly with
> +	 * GFP_PGTABLE_USER to prevent these PTEs being allocated in high
> +	 * memory.
> +	 */
> +	new = __pte_alloc_one(mm, GFP_PGTABLE_USER);
> +	if (!new)
> +		return ERR_PTR(-ENOMEM);
> +
> +	spin_lock(ptl);
> +	if (!pmd_same(pmd, *pmdp)) {
> +		spin_unlock(ptl);
> +		pgtable_pte_page_dtor(new);
> +		__free_page(new);
> +		goto retry;
> +	}
> +
> +	mm_inc_nr_ptes(mm);
> +	smp_wmb(); /* See comment in pmd_install() */
> +	pmd_populate(mm, pmdp, new);
> +	spin_unlock(ptl);
> +	return pte_offset_kernel(pmdp, addr);
> +}
> +
>  static void coalesce_file_region(struct resv_map *resv, struct file_region *rg)
>  {
>  	struct file_region *nrg, *prg;
> -- 
> 2.38.0.135.g90850a2211-goog
> 


  reply	other threads:[~2022-12-13 19:32 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-21 16:36 [RFC PATCH v2 00/47] hugetlb: introduce HugeTLB high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE James Houghton
2022-11-16 16:30   ` Peter Xu
2022-11-21 18:33     ` James Houghton
2022-12-08 22:55       ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 02/47] hugetlb: remove mk_huge_pte; it is unused James Houghton
2022-11-16 16:35   ` Peter Xu
2022-12-07 23:13   ` Mina Almasry
2022-12-08 23:42   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 03/47] hugetlb: remove redundant pte_mkhuge in migration path James Houghton
2022-11-16 16:36   ` Peter Xu
2022-12-07 23:16   ` Mina Almasry
2022-12-09  0:10   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 04/47] hugetlb: only adjust address ranges when VMAs want PMD sharing James Houghton
2022-11-16 16:50   ` Peter Xu
2022-12-09  0:22   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason James Houghton
2022-11-16 17:08   ` Peter Xu
2022-11-21 18:11     ` James Houghton
2022-12-07 23:33   ` Mina Almasry
2022-12-09 22:36   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 06/47] hugetlb: extend vma lock for shared vmas James Houghton
2022-11-30 21:01   ` Peter Xu
2022-11-30 23:29     ` James Houghton
2022-12-09 22:48     ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 07/47] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-12-09 22:52   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions James Houghton
2022-11-16 17:19   ` Peter Xu
2022-12-08  0:26   ` Mina Almasry
2022-12-09 15:41     ` James Houghton
2022-12-13  0:13   ` Mike Kravetz
2022-12-13 15:49     ` James Houghton
2022-12-15 17:51       ` Mike Kravetz
2022-12-15 18:08         ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 09/47] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-12-08  0:30   ` Mina Almasry
2022-12-13  0:25   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-11-16 22:17   ` Peter Xu
2022-11-17  1:00     ` James Houghton
2022-11-17 16:27       ` Peter Xu
2022-12-08  0:46   ` Mina Almasry
2022-12-09 16:02     ` James Houghton
2022-12-13 18:44       ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc James Houghton
2022-12-13 19:32   ` Mike Kravetz [this message]
2022-12-13 20:18     ` James Houghton
2022-12-14  0:04       ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step James Houghton
2022-11-16 22:02   ` Peter Xu
2022-11-17  1:39     ` James Houghton
2022-12-14  0:47   ` Mike Kravetz
2023-01-05  0:57   ` Jane Chu
2023-01-05  1:12     ` Jane Chu
2023-01-05  1:23     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 13/47] hugetlb: add make_huge_pte_with_shift James Houghton
2022-12-14  1:08   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 14/47] hugetlb: make default arch_make_huge_pte understand small mappings James Houghton
2022-12-14 22:17   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 15/47] hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped James Houghton
2022-12-14 23:37   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 16/47] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-12-15  0:28   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 17/47] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-12-15 18:15   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 18/47] hugetlb: enlighten follow_hugetlb_page to support HGM James Houghton
2022-12-15 19:29   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 19/47] hugetlb: make hugetlb_follow_page_mask HGM-enabled James Houghton
2022-12-16  0:25   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 20/47] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 21/47] mm: rmap: provide pte_order in page_vma_mapped_walk James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 22/47] mm: rmap: make page_vma_mapped_walk callers use pte_order James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 23/47] rmap: update hugetlb lock comment for HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 24/47] hugetlb: update page_vma_mapped to do high-granularity walks James Houghton
2022-12-15 17:49   ` James Houghton
2022-12-15 18:45     ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 25/47] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-11-30 21:32   ` Peter Xu
2022-11-30 23:18     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 26/47] hugetlb: make move_hugetlb_page_tables compatible with HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 27/47] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 28/47] rmap: in try_to_{migrate,unmap}_one, check head page for page flags James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 29/47] hugetlb: add high-granularity migration support James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 30/47] hugetlb: add high-granularity check for hwpoison in fault path James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 31/47] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 32/47] hugetlb: add for_each_hgm_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-11-16 22:28   ` Peter Xu
2022-11-16 23:30     ` James Houghton
2022-12-21 19:23       ` Peter Xu
2022-12-21 20:21         ` James Houghton
2022-12-21 21:39           ` Mike Kravetz
2022-12-21 22:10             ` Peter Xu
2022-12-21 22:31               ` Mike Kravetz
2022-12-22  0:02                 ` James Houghton
2022-12-22  0:38                   ` Mike Kravetz
2022-12-22  1:24                     ` James Houghton
2022-12-22 14:30                       ` Peter Xu
2022-12-27 17:02                         ` James Houghton
2023-01-03 17:06                           ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 34/47] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-11-17 16:58   ` Peter Xu
2022-12-23 18:38   ` Peter Xu
2022-12-27 16:38     ` James Houghton
2023-01-03 17:09       ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 35/47] userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM James Houghton
2022-12-22 21:47   ` Peter Xu
2022-12-27 16:39     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 36/47] hugetlb: add MADV_COLLAPSE for hugetlb James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 37/47] hugetlb: remove huge_pte_lock and huge_pte_lockptr James Houghton
2022-11-16 20:16   ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 38/47] hugetlb: replace make_huge_pte with make_huge_pte_with_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 39/47] mm: smaps: add stats for HugeTLB mapping size James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 40/47] hugetlb: x86: enable high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 41/47] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 42/47] docs: proc: include information about HugeTLB HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 43/47] selftests/vm: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 44/47] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 45/47] selftests/vm: add anon and shared hugetlb to migration test James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 46/47] selftests/vm: add hugetlb HGM test to migration selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 47/47] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5jTTYwzFHgmm5l6@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=manish.mishra@nutanix.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.