All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Peter Xu <peterx@redhat.com>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wei Chen <harperchen1110@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
Date: Mon, 24 Oct 2022 16:14:23 -0700	[thread overview]
Message-ID: <Y1ccT8Q9ESmR3+X9@monkey> (raw)
In-Reply-To: <Y1cJz6i5YMNfSeAm@monkey>

On 10/24/22 14:55, Mike Kravetz wrote:
> On 10/22/22 19:50, Mike Kravetz wrote:
> > madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear the
> > page tables associated with the address range.  For hugetlb vmas,
> > zap_page_range will call __unmap_hugepage_range_final.  However,
> > __unmap_hugepage_range_final assumes the passed vma is about to be
> > removed and deletes the vma_lock to prevent pmd sharing as the vma is
> > on the way out.  In the case of madvise(MADV_DONTNEED) the vma remains,
> > but the missing vma_lock prevents pmd sharing and could potentially
> > lead to issues with truncation/fault races.
> > 
> > This issue was originally reported here [1] as a BUG triggered in
> > page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
> > vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
> > prevent pmd sharing.  Subsequent faults on this vma were confused as
> > VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
> > was not set in new pages added to the page table.  This resulted in
> > pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.
> > 
> > Create a new routine clear_hugetlb_page_range() that can be called from
> > madvise(MADV_DONTNEED) for hugetlb vmas.  It has the same setup as
> > zap_page_range, but does not delete the vma_lock.
> 
> After seeing a syzbot use after free report [2] that is also addressed by
> this patch, I started thinking ... 
> 
> When __unmap_hugepage_range_final was created, the only time unmap_single_vma
> was called for hugetlb vmas was during process exit time via exit_mmap.  I got
> in trouble when I added a call via madvise(MADV_DONTNEED) which calls
> zap_page_range.  This patch takes care of that calling path by having
> madvise(MADV_DONTNEED) call a new routine clear_hugetlb_page_range instead of
> zap_page_range for hugetlb vmas.  The use after free bug had me auditing code
> paths to make sure __unmap_hugepage_range_final was REALLY only called at
> process exit time.  If not, and we could fault on a vma after calling
> __unmap_hugepage_range_final we would be in trouble.
> 
> My thought was, what if we had __unmap_hugepage_range_final check mm->mm_users
> to determine if it was being called in the process exit path?  If !mm_users,
> then we can delete the vma_lock to prevent pmd sharing as we know the process
> is exiting.  If not, we do not delete the lock.  That seems to be more robust
> and would prevent issues if someone accidentally introduces a new code path
> where __unmap_hugepage_range_final (unmap_single_vma for a hugetlb vma)
> could be called outside process exit context.
> 
> Thoughts?
> 
> [2] https://lore.kernel.org/linux-mm/000000000000d5e00a05e834962e@google.com/

Sorry if this seems like I am talking to myself.  Here is a proposed v3 as
described above.

From 1466fd43e180ede3f6479d1dca4e7f350f86f80b Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Mon, 24 Oct 2022 15:40:05 -0700
Subject: [PATCH v3] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED
 processing

When hugetlb madvise(MADV_DONTNEED) support was added, the existing
code to call zap_page_range() to clear the page tables associated
with the address range was not modified.  However, for hugetlb vmas
zap_page_range will call __unmap_hugepage_range_final. This routine
assumes the passed hugetlb vma is about to be removed and deletes
the vma_lock to prevent pmd sharing as the vma is on the way out.  In
the case of madvise(MADV_DONTNEED) the vma remains, but the missing
vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.

This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag
to prevent pmd sharing.  Subsequent faults on this vma were confused
as VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
was not set in new pages added to the page table.  This resulted in
pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.

__unmap_hugepage_range_final was originally designed only to be called
in the context of process exit (exit_mmap).  It is now called in the
context of madvise(MADV_DONTNEED).  Restructure the routine and check
for !mm_users which indicates it is being called in the context of
process exit.  If being called in process exit context, delete the
vma_lock.  Otherwise, just unmap and leave the lock.  Since the routine
is called in more than just process exit context, rename to eliminate
'final' as __unmap_hugetlb_page_range.

[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/

Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Reported-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
---
v3 - Check for !mm_users in __unmap_hugepage_range_final instead of
     creating a separate function.

 include/linux/hugetlb.h |  4 ++--
 mm/hugetlb.c            | 30 ++++++++++++++++++++----------
 mm/memory.c             |  2 +-
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a899bc76d677..bc19a1f6ca10 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -158,7 +158,7 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 void unmap_hugepage_range(struct vm_area_struct *,
 			  unsigned long, unsigned long, struct page *,
 			  zap_flags_t);
-void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma,
 			  unsigned long start, unsigned long end,
 			  struct page *ref_page, zap_flags_t zap_flags);
@@ -418,7 +418,7 @@ static inline unsigned long hugetlb_change_protection(
 	return 0;
 }
 
-static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+static inline void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
 			struct vm_area_struct *vma, unsigned long start,
 			unsigned long end, struct page *ref_page,
 			zap_flags_t zap_flags)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 931789a8f734..3fe1152c3c20 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5202,27 +5202,37 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
 		tlb_flush_mmu_tlbonly(tlb);
 }
 
-void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma, unsigned long start,
 			  unsigned long end, struct page *ref_page,
 			  zap_flags_t zap_flags)
 {
+	struct mm_struct *mm = vma->vm_mm;
+
 	hugetlb_vma_lock_write(vma);
 	i_mmap_lock_write(vma->vm_file->f_mapping);
 
 	__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
 
 	/*
-	 * Unlock and free the vma lock before releasing i_mmap_rwsem.  When
-	 * the vma_lock is freed, this makes the vma ineligible for pmd
-	 * sharing.  And, i_mmap_rwsem is required to set up pmd sharing.
-	 * This is important as page tables for this unmapped range will
-	 * be asynchrously deleted.  If the page tables are shared, there
-	 * will be issues when accessed by someone else.
+	 * Free the vma_lock here if process exiting
 	 */
-	__hugetlb_vma_unlock_write_free(vma);
-
-	i_mmap_unlock_write(vma->vm_file->f_mapping);
+	if (!atomic_read(&mm->mm_users)) {
+		/*
+		 * Unlock and free the vma lock before releasing i_mmap_rwsem.
+		 * When the vma_lock is freed, this makes the vma ineligible
+		 * for pmd sharing.  And, i_mmap_rwsem is required to set up
+		 * pmd sharing.  This is important as page tables for this
+		 * unmapped range will be asynchrously deleted.  If the page
+		 * tables are shared, there will be issues when accessed by
+		 * someone else.
+		 */
+		__hugetlb_vma_unlock_write_free(vma);
+		i_mmap_unlock_write(vma->vm_file->f_mapping);
+	} else {
+		i_mmap_unlock_write(vma->vm_file->f_mapping);
+		hugetlb_vma_unlock_write(vma);
+	}
 }
 
 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
diff --git a/mm/memory.c b/mm/memory.c
index 8e72f703ed99..1de8ea504047 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1687,7 +1687,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			if (vma->vm_file) {
 				zap_flags_t zap_flags = details ?
 				    details->zap_flags : 0;
-				__unmap_hugepage_range_final(tlb, vma, start, end,
+				__unmap_hugetlb_page_range(tlb, vma, start, end,
 							     NULL, zap_flags);
 			}
 		} else
-- 
2.37.3


  reply	other threads:[~2022-10-25  0:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-23  2:50 [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing Mike Kravetz
2022-10-24 21:55 ` Mike Kravetz
2022-10-24 23:14   ` Mike Kravetz [this message]
2022-10-26 21:42 ` Peter Xu
2022-10-26 23:54   ` Mike Kravetz
2022-10-27  1:12     ` Peter Xu
2022-10-28 15:23       ` Mike Kravetz
2022-10-28 16:13         ` Peter Xu
2022-10-28 21:17           ` Mike Kravetz
2022-10-28 23:20             ` Peter Xu
2022-10-30  0:15               ` Mike Kravetz
2022-10-30  0:54                 ` Nadav Amit
2022-10-30 18:43                   ` Peter Xu
2022-10-30 18:52                     ` Nadav Amit
2022-10-31  1:44                       ` Mike Kravetz
2022-11-02 19:24                         ` Peter Xu
2022-11-07 20:01                           ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1ccT8Q9ESmR3+X9@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=harperchen1110@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.