linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Peter Xu <peterx@redhat.com>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wei Chen <harperchen1110@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
Date: Mon, 24 Oct 2022 16:14:23 -0700	[thread overview]
Message-ID: <Y1ccT8Q9ESmR3+X9@monkey> (raw)
In-Reply-To: <Y1cJz6i5YMNfSeAm@monkey>

On 10/24/22 14:55, Mike Kravetz wrote:
> On 10/22/22 19:50, Mike Kravetz wrote:
> > madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear the
> > page tables associated with the address range.  For hugetlb vmas,
> > zap_page_range will call __unmap_hugepage_range_final.  However,
> > __unmap_hugepage_range_final assumes the passed vma is about to be
> > removed and deletes the vma_lock to prevent pmd sharing as the vma is
> > on the way out.  In the case of madvise(MADV_DONTNEED) the vma remains,
> > but the missing vma_lock prevents pmd sharing and could potentially
> > lead to issues with truncation/fault races.
> > 
> > This issue was originally reported here [1] as a BUG triggered in
> > page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
> > vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
> > prevent pmd sharing.  Subsequent faults on this vma were confused as
> > VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
> > was not set in new pages added to the page table.  This resulted in
> > pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.
> > 
> > Create a new routine clear_hugetlb_page_range() that can be called from
> > madvise(MADV_DONTNEED) for hugetlb vmas.  It has the same setup as
> > zap_page_range, but does not delete the vma_lock.
> 
> After seeing a syzbot use after free report [2] that is also addressed by
> this patch, I started thinking ... 
> 
> When __unmap_hugepage_range_final was created, the only time unmap_single_vma
> was called for hugetlb vmas was during process exit time via exit_mmap.  I got
> in trouble when I added a call via madvise(MADV_DONTNEED) which calls
> zap_page_range.  This patch takes care of that calling path by having
> madvise(MADV_DONTNEED) call a new routine clear_hugetlb_page_range instead of
> zap_page_range for hugetlb vmas.  The use after free bug had me auditing code
> paths to make sure __unmap_hugepage_range_final was REALLY only called at
> process exit time.  If not, and we could fault on a vma after calling
> __unmap_hugepage_range_final we would be in trouble.
> 
> My thought was, what if we had __unmap_hugepage_range_final check mm->mm_users
> to determine if it was being called in the process exit path?  If !mm_users,
> then we can delete the vma_lock to prevent pmd sharing as we know the process
> is exiting.  If not, we do not delete the lock.  That seems to be more robust
> and would prevent issues if someone accidentally introduces a new code path
> where __unmap_hugepage_range_final (unmap_single_vma for a hugetlb vma)
> could be called outside process exit context.
> 
> Thoughts?
> 
> [2] https://lore.kernel.org/linux-mm/000000000000d5e00a05e834962e@google.com/

Sorry if this seems like I am talking to myself.  Here is a proposed v3 as
described above.

From 1466fd43e180ede3f6479d1dca4e7f350f86f80b Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Mon, 24 Oct 2022 15:40:05 -0700
Subject: [PATCH v3] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED
 processing

When hugetlb madvise(MADV_DONTNEED) support was added, the existing
code to call zap_page_range() to clear the page tables associated
with the address range was not modified.  However, for hugetlb vmas
zap_page_range will call __unmap_hugepage_range_final. This routine
assumes the passed hugetlb vma is about to be removed and deletes
the vma_lock to prevent pmd sharing as the vma is on the way out.  In
the case of madvise(MADV_DONTNEED) the vma remains, but the missing
vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.

This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag
to prevent pmd sharing.  Subsequent faults on this vma were confused
as VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
was not set in new pages added to the page table.  This resulted in
pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.

__unmap_hugepage_range_final was originally designed only to be called
in the context of process exit (exit_mmap).  It is now called in the
context of madvise(MADV_DONTNEED).  Restructure the routine and check
for !mm_users which indicates it is being called in the context of
process exit.  If being called in process exit context, delete the
vma_lock.  Otherwise, just unmap and leave the lock.  Since the routine
is called in more than just process exit context, rename to eliminate
'final' as __unmap_hugetlb_page_range.

[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/

Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Reported-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
---
v3 - Check for !mm_users in __unmap_hugepage_range_final instead of
     creating a separate function.

 include/linux/hugetlb.h |  4 ++--
 mm/hugetlb.c            | 30 ++++++++++++++++++++----------
 mm/memory.c             |  2 +-
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a899bc76d677..bc19a1f6ca10 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -158,7 +158,7 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 void unmap_hugepage_range(struct vm_area_struct *,
 			  unsigned long, unsigned long, struct page *,
 			  zap_flags_t);
-void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma,
 			  unsigned long start, unsigned long end,
 			  struct page *ref_page, zap_flags_t zap_flags);
@@ -418,7 +418,7 @@ static inline unsigned long hugetlb_change_protection(
 	return 0;
 }
 
-static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+static inline void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
 			struct vm_area_struct *vma, unsigned long start,
 			unsigned long end, struct page *ref_page,
 			zap_flags_t zap_flags)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 931789a8f734..3fe1152c3c20 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5202,27 +5202,37 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
 		tlb_flush_mmu_tlbonly(tlb);
 }
 
-void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma, unsigned long start,
 			  unsigned long end, struct page *ref_page,
 			  zap_flags_t zap_flags)
 {
+	struct mm_struct *mm = vma->vm_mm;
+
 	hugetlb_vma_lock_write(vma);
 	i_mmap_lock_write(vma->vm_file->f_mapping);
 
 	__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
 
 	/*
-	 * Unlock and free the vma lock before releasing i_mmap_rwsem.  When
-	 * the vma_lock is freed, this makes the vma ineligible for pmd
-	 * sharing.  And, i_mmap_rwsem is required to set up pmd sharing.
-	 * This is important as page tables for this unmapped range will
-	 * be asynchrously deleted.  If the page tables are shared, there
-	 * will be issues when accessed by someone else.
+	 * Free the vma_lock here if process exiting
 	 */
-	__hugetlb_vma_unlock_write_free(vma);
-
-	i_mmap_unlock_write(vma->vm_file->f_mapping);
+	if (!atomic_read(&mm->mm_users)) {
+		/*
+		 * Unlock and free the vma lock before releasing i_mmap_rwsem.
+		 * When the vma_lock is freed, this makes the vma ineligible
+		 * for pmd sharing.  And, i_mmap_rwsem is required to set up
+		 * pmd sharing.  This is important as page tables for this
+		 * unmapped range will be asynchrously deleted.  If the page
+		 * tables are shared, there will be issues when accessed by
+		 * someone else.
+		 */
+		__hugetlb_vma_unlock_write_free(vma);
+		i_mmap_unlock_write(vma->vm_file->f_mapping);
+	} else {
+		i_mmap_unlock_write(vma->vm_file->f_mapping);
+		hugetlb_vma_unlock_write(vma);
+	}
 }
 
 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
diff --git a/mm/memory.c b/mm/memory.c
index 8e72f703ed99..1de8ea504047 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1687,7 +1687,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			if (vma->vm_file) {
 				zap_flags_t zap_flags = details ?
 				    details->zap_flags : 0;
-				__unmap_hugepage_range_final(tlb, vma, start, end,
+				__unmap_hugetlb_page_range(tlb, vma, start, end,
 							     NULL, zap_flags);
 			}
 		} else
-- 
2.37.3



  reply	other threads:[~2022-10-24 23:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-23  2:50 [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing Mike Kravetz
2022-10-24 21:55 ` Mike Kravetz
2022-10-24 23:14   ` Mike Kravetz [this message]
2022-10-26 21:42 ` Peter Xu
2022-10-26 23:54   ` Mike Kravetz
2022-10-27  1:12     ` Peter Xu
2022-10-28 15:23       ` Mike Kravetz
2022-10-28 16:13         ` Peter Xu
2022-10-28 21:17           ` Mike Kravetz
2022-10-28 23:20             ` Peter Xu
2022-10-30  0:15               ` Mike Kravetz
2022-10-30  0:54                 ` Nadav Amit
2022-10-30 18:43                   ` Peter Xu
2022-10-30 18:52                     ` Nadav Amit
2022-10-31  1:44                       ` Mike Kravetz
2022-11-02 19:24                         ` Peter Xu
2022-11-07 20:01                           ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1ccT8Q9ESmR3+X9@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=harperchen1110@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).