Re: [PATCH] mm/hugetlb: Fix pgtable lock on pmd sharing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <songmuchun@bytedance.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: Re: [PATCH] mm/hugetlb: Fix pgtable lock on pmd sharing
Date: Mon, 12 Jun 2023 13:44:18 -0700	[thread overview]
Message-ID: <20230612204418.GD3704@monkey> (raw)
In-Reply-To: <20230612160420.809818-1-peterx@redhat.com>

On 06/12/23 12:04, Peter Xu wrote:
> Huge pmd sharing operates on PUD not PMD, huge_pte_lock() is not suitable
> in this case because it should only work for last level pte changes, while
> pmd sharing is always one level higher.

Right!  That lock does not prevent someone else from concurrently modifying
the PUD.

> Meanwhile, here we're locking over the spte pgtable lock which is even not
> a lock for current mm but someone else's.
> 
> It seems even racy on operating on the lock, as after put_page() of the
> spte pgtable page logically the page can be released, so at least the
> spin_unlock() needs to be done after the put_page().

Agree.

> No report I am aware, I'm not even sure whether it'll just work on taking
> the spte pmd lock, because while we're holding i_mmap read lock it probably
> means the vma interval tree is frozen, all pte allocators over this pud
> entry could always find the specific svma and spte page, so maybe they'll
> serialize on this spte page lock?

It seems they would serialize IF they were trying to instantiate the same
shared page.  However, I suppose another thread could be trying to
instantiate something totally different in the VA range represented by that
PUD.  In this case, it seems like there would be no synchronization.

>                                    Even so, doesn't seem to be expected.
> It just seems to be an accident of cb900f412154.
> 
> Fix it with the proper pud lock (which is the mm's page_table_lock).
> 
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Fixes: cb900f412154 ("mm, hugetlb: convert hugetlbfs to use split pmd lock")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/hugetlb.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Agree with this change.  But, it does make one wonder if the pud_clear()
in huge_pmd_unshare is safe.  Like here, we really should be holding the
higher level lock but are holding the PMD lock.
-- 
Mike Kravetz

> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index dfa412d8cb30..270ec0ecd5a1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7133,7 +7133,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  	unsigned long saddr;
>  	pte_t *spte = NULL;
>  	pte_t *pte;
> -	spinlock_t *ptl;
>  
>  	i_mmap_lock_read(mapping);
>  	vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
> @@ -7154,7 +7153,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  	if (!spte)
>  		goto out;
>  
> -	ptl = huge_pte_lock(hstate_vma(vma), mm, spte);
> +	spin_lock(&mm->page_table_lock);
>  	if (pud_none(*pud)) {
>  		pud_populate(mm, pud,
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> @@ -7162,7 +7161,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  	} else {
>  		put_page(virt_to_page(spte));
>  	}
> -	spin_unlock(ptl);
> +	spin_unlock(&mm->page_table_lock);
>  out:
>  	pte = (pte_t *)pmd_alloc(mm, pud, addr);
>  	i_mmap_unlock_read(mapping);
> -- 
> 2.40.1
>

next prev parent reply	other threads:[~2023-06-12 20:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-12 16:04 [PATCH] mm/hugetlb: Fix pgtable lock on pmd sharing Peter Xu
2023-06-12 20:44 ` Mike Kravetz [this message]
2023-06-12 21:35   ` Peter Xu
2023-06-12 21:44     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230612204418.GD3704@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.