linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	James Houghton <jthoughton@google.com>,
	stable@vger.kernel.org, Oscar Salvador <osalvador@suse.de>,
	Muchun Song <muchun.song@linux.dev>,
	Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: Re: [PATCH v3] mm/hugetlb: fix hugetlb vs. core-mm PT locking
Date: Thu, 1 Aug 2024 09:52:30 -0400	[thread overview]
Message-ID: <ZquTHvK0Rc0xBA4y@x1n> (raw)
In-Reply-To: <541f6c23-77ad-4d46-a8ed-fb18c9b635b3@redhat.com>

On Thu, Aug 01, 2024 at 10:50:18AM +0200, David Hildenbrand wrote:
> On 31.07.24 14:21, David Hildenbrand wrote:
> > We recently made GUP's common page table walking code to also walk hugetlb
> > VMAs without most hugetlb special-casing, preparing for the future of
> > having less hugetlb-specific page table walking code in the codebase.
> > Turns out that we missed one page table locking detail: page table locking
> > for hugetlb folios that are not mapped using a single PMD/PUD.
> 
> James, Peter,
> 
> the following seems to get the job done. Thoughts?

OK to me, so my A-b can keep, but let me still comment; again, all
nitpicks.

> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 8e462205400d..776dc3914d9e 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -938,10 +938,40 @@ static inline bool htlb_allow_alloc_fallback(int reason)
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  					   struct mm_struct *mm, pte_t *pte)
>  {
> -	if (huge_page_size(h) == PMD_SIZE)
> +	unsigned long size = huge_page_size(h);
> +
> +	VM_WARN_ON(size == PAGE_SIZE);
> +
> +	/*
> +	 * hugetlb must use the exact same PT locks as core-mm page table
> +	 * walkers would. When modifying a PTE table, hugetlb must take the
> +	 * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
> +	 * PT lock etc.
> +	 *
> +	 * The expectation is that any hugetlb folio smaller than a PMD is
> +	 * always mapped into a single PTE table and that any hugetlb folio
> +	 * smaller than a PUD (but at least as big as a PMD) is always mapped
> +	 * into a single PMD table.
> +	 *
> +	 * If that does not hold for an architecture, then that architecture
> +	 * must disable split PT locks such that all *_lockptr() functions
> +	 * will give us the same result: the per-MM PT lock.
> +	 *
> +	 * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where
> +	 * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use the MM PT lock
> +	 * directly with a PMD hugetlb size, whereby core-mm would call
> +	 * pmd_lockptr() instead. However, in such configurations split PMD
> +	 * locks are disabled -- split locks don't make sense on a single
> +	 * PGDIR page table -- and the end result is the same.
> +	 */
> +	if (size >= P4D_SIZE)
> +		return &mm->page_table_lock;

I'd drop this so the mm lock fallback will be done below (especially in
reality the pud lock is always mm lock for now..).  Also this line reads
like there can be P4D size huge page but in reality PUD is the largest
(nopxx doesn't count).  We also same some cycles in most cases if removed.

> +	else if (size >= PUD_SIZE)
> +		return pud_lockptr(mm, (pud_t *) pte);
> +	else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE))

I thought this HIGHPTE can also be dropped? Because in HIGHPTE it should
never have lower-than-PMD huge pages or we're in trouble.  That's why I
kept one WARN_ON() in my pesudo code but only before trying to take the pte
lockptr.

>  		return pmd_lockptr(mm, (pmd_t *) pte);
> -	VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
> -	return &mm->page_table_lock;
> +	/* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
> +	return ptep_lockptr(mm, pte);
>  }
>  #ifndef hugepages_supported
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a890a1731c14..bd219ac9c026 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2869,6 +2869,13 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  	return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
> +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
> +{
> +	BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
> +	BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE);
> +	return ptlock_ptr(virt_to_ptdesc(pte));
> +}

Great to know we can drop the mask..

Thanks,

> +
>  static inline bool ptlock_init(struct ptdesc *ptdesc)
>  {
>  	/*
> @@ -2893,6 +2900,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
>  	return &mm->page_table_lock;
>  }
> +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
> +{
> +	return &mm->page_table_lock;
> +}
>  static inline void ptlock_cache_init(void) {}
>  static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void ptlock_free(struct ptdesc *ptdesc) {}
> -- 
> 2.45.2
> 
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

-- 
Peter Xu



  reply	other threads:[~2024-08-01 13:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31 12:21 [PATCH v3] mm/hugetlb: fix hugetlb vs. core-mm PT locking David Hildenbrand
2024-07-31 14:54 ` Peter Xu
2024-07-31 16:33   ` David Hildenbrand
2024-08-01  2:03     ` Michael Ellerman
2024-08-01  7:35       ` David Hildenbrand
2024-08-14 17:25     ` Christophe Leroy
2024-08-01  8:50 ` David Hildenbrand
2024-08-01 13:52   ` Peter Xu [this message]
2024-08-01 15:35     ` David Hildenbrand
2024-08-01 16:07       ` Peter Xu
2024-08-01 16:24         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZquTHvK0Rc0xBA4y@x1n \
    --to=peterx@redhat.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).