All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	James Houghton <jthoughton@google.com>,
	Jann Horn <jannh@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Muchun Song <songmuchun@bytedance.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH 09/10] mm/hugetlb: Make page_vma_mapped_walk() safe to pmd unshare
Date: Tue, 6 Dec 2022 12:43:12 -0500	[thread overview]
Message-ID: <Y49/MPNKib6eDfqk@x1n> (raw)
In-Reply-To: <Y49+aYHTy/UwV7JQ@x1n>

[-- Attachment #1: Type: text/plain, Size: 4753 bytes --]

On Tue, Dec 06, 2022 at 12:39:53PM -0500, Peter Xu wrote:
> On Tue, Dec 06, 2022 at 09:10:00AM -0800, Mike Kravetz wrote:
> > On 12/05/22 15:52, Mike Kravetz wrote:
> > > On 11/29/22 14:35, Peter Xu wrote:
> > > > Since page_vma_mapped_walk() walks the pgtable, it needs the vma lock
> > > > to make sure the pgtable page will not be freed concurrently.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  include/linux/rmap.h | 4 ++++
> > > >  mm/page_vma_mapped.c | 5 ++++-
> > > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > > > index bd3504d11b15..a50d18bb86aa 100644
> > > > --- a/include/linux/rmap.h
> > > > +++ b/include/linux/rmap.h
> > > > @@ -13,6 +13,7 @@
> > > >  #include <linux/highmem.h>
> > > >  #include <linux/pagemap.h>
> > > >  #include <linux/memremap.h>
> > > > +#include <linux/hugetlb.h>
> > > >  
> > > >  /*
> > > >   * The anon_vma heads a list of private "related" vmas, to scan if
> > > > @@ -408,6 +409,9 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
> > > >  		pte_unmap(pvmw->pte);
> > > >  	if (pvmw->ptl)
> > > >  		spin_unlock(pvmw->ptl);
> > > > +	/* This needs to be after unlock of the spinlock */
> > > > +	if (is_vm_hugetlb_page(pvmw->vma))
> > > > +		hugetlb_vma_unlock_read(pvmw->vma);
> > > >  }
> > > >  
> > > >  bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw);
> > > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > > > index 93e13fc17d3c..f94ec78b54ff 100644
> > > > --- a/mm/page_vma_mapped.c
> > > > +++ b/mm/page_vma_mapped.c
> > > > @@ -169,10 +169,13 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> > > >  		if (pvmw->pte)
> > > >  			return not_found(pvmw);
> > > >  
> > > > +		hugetlb_vma_lock_read(vma);
> > > >  		/* when pud is not present, pte will be NULL */
> > > >  		pvmw->pte = huge_pte_offset(mm, pvmw->address, size);
> > > > -		if (!pvmw->pte)
> > > > +		if (!pvmw->pte) {
> > > > +			hugetlb_vma_unlock_read(vma);
> > > >  			return false;
> > > > +		}
> > > >  
> > > >  		pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte);
> > > >  		if (!check_pte(pvmw))
> > > 
> > > I think this is going to cause try_to_unmap() to always fail for hugetlb
> > > shared pages.  See try_to_unmap_one:
> > > 
> > > 	while (page_vma_mapped_walk(&pvmw)) {
> > > 		...
> > > 		if (folio_test_hugetlb(folio)) {
> > > 			...
> > > 			/*
> > >                          * To call huge_pmd_unshare, i_mmap_rwsem must be
> > >                          * held in write mode.  Caller needs to explicitly
> > >                          * do this outside rmap routines.
> > >                          *
> > >                          * We also must hold hugetlb vma_lock in write mode.
> > >                          * Lock order dictates acquiring vma_lock BEFORE
> > >                          * i_mmap_rwsem.  We can only try lock here and fail
> > >                          * if unsuccessful.
> > >                          */
> > >                         if (!anon) {
> > >                                 VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> > >                                 if (!hugetlb_vma_trylock_write(vma)) {
> > >                                         page_vma_mapped_walk_done(&pvmw);
> > >                                         ret = false;
> > > 				}
> > > 
> > > 
> > > Can not think of a great solution right now.
> > 
> > Thought of this last night ...
> > 
> > Perhaps we do not need vma_lock in this code path (not sure about all
> > page_vma_mapped_walk calls).  Why?  We already hold i_mmap_rwsem.
> 
> Exactly.  The only concern is when it's not in a rmap.
> 
> I'm actually preparing something that adds a new flag to PVMW, like:
> 
> #define PVMW_HUGETLB_NEEDS_LOCK	(1 << 2)
> 
> But maybe we don't need that at all, since I had a closer look the only
> outliers of not using a rmap is:
> 
> __replace_page
> write_protect_page
> 
> I'm pretty sure ksm doesn't have hugetlb involved, then the other one is
> uprobe (uprobe_write_opcode).  I think it's the same.  If it's true, we can
> simply drop this patch.  Then we also have hugetlb_walk and the lock checks
> there guarantee that we're safe anyways.
> 
> Potentially we can document this fact, which I also attached a comment
> patch just for it to be appended to the end of the patchset.
> 
> Mike, let me know what do you think.
> 
> Andrew, if this patch to be dropped then the last patch may not cleanly
> apply.  Let me know if you want a full repost of the things.

The document patch that can be appended to the end of this series attached.
I referenced hugetlb_walk() so it needs to be the last patch.

-- 
Peter Xu

[-- Attachment #2: 0001-mm-hugetlb-Document-why-page_vma_mapped_walk-is-safe.patch --]
[-- Type: text/plain, Size: 1404 bytes --]

From 754c2180804e9e86accf131573cbd956b8c62829 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 6 Dec 2022 12:36:04 -0500
Subject: [PATCH] mm/hugetlb: Document why page_vma_mapped_walk() is safe to
 walk
Content-type: text/plain

Taking vma lock here is not needed for now because all potential hugetlb
walkers here should have i_mmap_rwsem held.  Document the fact.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/page_vma_mapped.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index e97b2e23bd28..2e59a0419d22 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -168,8 +168,14 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 		/* The only possible mapping was handled on last iteration */
 		if (pvmw->pte)
 			return not_found(pvmw);
-
-		/* when pud is not present, pte will be NULL */
+		/*
+		 * NOTE: we don't need explicit lock here to walk the
+		 * hugetlb pgtable because either (1) potential callers of
+		 * hugetlb pvmw currently holds i_mmap_rwsem, or (2) the
+		 * caller will not walk a hugetlb vma (e.g. ksm or uprobe).
+		 * When one day this rule breaks, one will get a warning
+		 * in hugetlb_walk(), and then we'll figure out what to do.
+		 */
 		pvmw->pte = hugetlb_walk(vma, pvmw->address, size);
 		if (!pvmw->pte)
 			return false;
-- 
2.37.3


  reply	other threads:[~2022-12-06 17:43 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-29 19:35 [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Peter Xu
2022-11-29 19:35 ` [PATCH 01/10] mm/hugetlb: Let vma_offset_start() to return start Peter Xu
2022-11-30 10:11   ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 02/10] mm/hugetlb: Don't wait for migration entry during follow page Peter Xu
2022-11-30  4:37   ` Mike Kravetz
2022-11-30 10:15   ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 03/10] mm/hugetlb: Document huge_pte_offset usage Peter Xu
2022-11-30  4:55   ` Mike Kravetz
2022-11-30 15:58     ` Peter Xu
2022-12-05 21:47       ` Mike Kravetz
2022-11-30 10:21   ` David Hildenbrand
2022-11-30 10:24   ` David Hildenbrand
2022-11-30 16:09     ` Peter Xu
2022-11-30 16:11       ` David Hildenbrand
2022-11-30 16:25         ` Peter Xu
2022-11-30 16:31           ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 04/10] mm/hugetlb: Move swap entry handling into vma lock when faulted Peter Xu
2022-12-05 22:14   ` Mike Kravetz
2022-12-05 23:36     ` Peter Xu
2022-11-29 19:35 ` [PATCH 05/10] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Peter Xu
2022-11-30 16:08   ` David Hildenbrand
2022-12-05 22:23   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 06/10] mm/hugetlb: Make hugetlb_follow_page_mask() " Peter Xu
2022-11-30 16:09   ` David Hildenbrand
2022-12-05 22:29   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 07/10] mm/hugetlb: Make follow_hugetlb_page() " Peter Xu
2022-11-30 16:09   ` David Hildenbrand
2022-12-05 22:45   ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() " Peter Xu
2022-11-30 16:11   ` David Hildenbrand
2022-12-05 23:33   ` Mike Kravetz
2022-12-05 23:52     ` John Hubbard
2022-12-06 16:45       ` Peter Xu
2022-12-06 18:50         ` Mike Kravetz
2022-12-06 21:03         ` John Hubbard
2022-12-06 21:51           ` Peter Xu
2022-12-06 22:31             ` John Hubbard
2022-12-07  0:07               ` Peter Xu
2022-12-07  2:38                 ` John Hubbard
2022-12-07 14:58                   ` Peter Xu
2022-11-29 19:35 ` [PATCH 09/10] mm/hugetlb: Make page_vma_mapped_walk() " Peter Xu
2022-11-30 16:18   ` David Hildenbrand
2022-11-30 16:32     ` Peter Xu
2022-11-30 16:39       ` David Hildenbrand
2022-12-05 23:52   ` Mike Kravetz
2022-12-06 17:10     ` Mike Kravetz
2022-12-06 17:39       ` Peter Xu
2022-12-06 17:43         ` Peter Xu [this message]
2022-12-06 19:58           ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 10/10] mm/hugetlb: Introduce hugetlb_walk() Peter Xu
2022-11-30  5:18   ` Eric Biggers
2022-11-30 15:37     ` Peter Xu
2022-12-06  0:21       ` Mike Kravetz
2022-11-29 20:49 ` [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Andrew Morton
2022-11-29 21:19   ` Peter Xu
2022-11-29 21:26     ` Andrew Morton
2022-11-29 20:51 ` Andrew Morton
2022-11-29 21:36   ` Peter Xu
2022-11-30  9:46 ` David Hildenbrand
2022-11-30 16:23   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y49/MPNKib6eDfqk@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=riel@surriel.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.