All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/khugepaged: Fix ->anon_vma race
@ 2023-01-11 13:33 Jann Horn
  2023-01-12  1:06 ` Yang Shi
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jann Horn @ 2023-01-11 13:33 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Kirill A. Shutemov, Zach O'Keefe, linux-kernel,
	David Hildenbrand, Yang Shi

If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked. retract_page_tables() bails out if an ->anon_vma is
attached, but does this check before holding the mmap lock (as the comment
above the check explains).

If we racily merge an existing ->anon_vma (shared with a child process)
from a neighboring VMA, subsequent rmap traversals on pages belonging to
the child will be able to see the page tables that we are concurrently
removing while assuming that nothing else can access them.

Repeat the ->anon_vma check once we hold the mmap lock to ensure that there
really is no concurrent page table access.

Reported-by: Zach O'Keefe <zokeefe@google.com>
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
---
zokeefe@ pointed out to me that the current code (after my last round of patches)
can hit a lockdep assert by racing, and after staring at it a bit I've
convinced myself that this is a real, preexisting bug.
(I haven't written a reproducer for it though. One way to hit it might be
something along the lines of:

 - set up a process A with a private-file-mapping VMA V1
 - let A fork() to create process B, thereby copying V1 in A to V1' in B
 - let B extend the end of V1'
 - let B put some anon pages into the extended part of V1'
 - let A map a new private-file-mapping VMA V2 directly behind V1, without
   an anon_vma
[race begins here]
  - in A's thread 1: begin retract_page_tables() on V2, run through first
    ->anon_vma check
  - in A's thread 2: run __anon_vma_prepare() on V2 and ensure that it
    merges the anon_vma of V1 (which implies V1 and V2 must be mapping the
    same file at compatible offsets)
  - in B: trigger rmap traversal on anon page in V1'

 mm/khugepaged.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5cb401aa2b9d..0bfed37f3a3b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1644,7 +1644,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
 		 * has higher cost too. It would also probably require locking
 		 * the anon_vma.
 		 */
-		if (vma->anon_vma) {
+		if (READ_ONCE(vma->anon_vma)) {
 			result = SCAN_PAGE_ANON;
 			goto next;
 		}
@@ -1672,6 +1672,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
 		result = SCAN_PTE_MAPPED_HUGEPAGE;
 		if ((cc->is_khugepaged || is_target) &&
 		    mmap_write_trylock(mm)) {
+			/*
+			 * Re-check whether we have an ->anon_vma, because
+			 * collapse_and_free_pmd() requires that either no
+			 * ->anon_vma exists or the anon_vma is locked.
+			 * We already checked ->anon_vma above, but that check
+			 * is racy because ->anon_vma can be populated under the
+			 * mmap lock in read mode.
+			 */
+			if (vma->anon_vma) {
+				result = SCAN_PAGE_ANON;
+				goto unlock_next;
+			}
 			/*
 			 * When a vma is registered with uffd-wp, we can't
 			 * recycle the pmd pgtable because there can be pte

base-commit: 7dd4b804e08041ff56c88bdd8da742d14b17ed25
-- 
2.39.0.314.g84b9a713c41-goog



^ permalink raw reply related	[flat|nested] 21+ messages in thread
* [PATCH] mm/khugepaged: fix ->anon_vma race
@ 2025-09-04 14:26 Bjoern Doebel
  0 siblings, 0 replies; 21+ messages in thread
From: Bjoern Doebel @ 2025-09-04 14:26 UTC (permalink / raw)
  To: doebel
  Cc: Jann Horn, Zach O'Keefe, Kirill A. Shutemov, Yang Shi,
	David Hildenbrand, stable, Andrew Morton

From: Jann Horn <jannh@google.com>

commit 023f47a8250c6bdb4aebe744db4bf7f73414028b upstream.

If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.

Page table traversal is allowed under any one of the mmap lock, the
anon_vma lock (if the VMA is associated with an anon_vma), and the
mapping lock (if the VMA is associated with a mapping); and so to be
able to remove page tables, we must hold all three of them.
retract_page_tables() bails out if an ->anon_vma is attached, but does
this check before holding the mmap lock (as the comment above the check
explains).

If we racily merged an existing ->anon_vma (shared with a child
process) from a neighboring VMA, subsequent rmap traversals on pages
belonging to the child will be able to see the page tables that we are
concurrently removing while assuming that nothing else can access them.

Repeat the ->anon_vma check once we hold the mmap lock to ensure that
there really is no concurrent page table access.

Hitting this bug causes a lockdep warning in collapse_and_free_pmd(),
in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)".
It can also lead to use-after-free access.

Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/
Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Jann Horn <jannh@google.com>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@intel.linux.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[doebel@amazon.de: Kernel 5.15 uses a different control flow pattern,
    context adjustments.]
Signed-off-by: Bjoern Doebel <doebel@amazon.de>
---
 mm/khugepaged.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 203792e70ac1..e318c1abc81f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1609,7 +1609,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 		 * has higher cost too. It would also probably require locking
 		 * the anon_vma.
 		 */
-		if (vma->anon_vma)
+		if (READ_ONCE(vma->anon_vma))
 			continue;
 		addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 		if (addr & ~HPAGE_PMD_MASK)
@@ -1631,6 +1631,19 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 			if (!khugepaged_test_exit(mm)) {
 				struct mmu_notifier_range range;
 
+				/*
+				 * Re-check whether we have an ->anon_vma, because
+				 * collapse_and_free_pmd() requires that either no
+				 * ->anon_vma exists or the anon_vma is locked.
+				 * We already checked ->anon_vma above, but that check
+				 * is racy because ->anon_vma can be populated under the
+				 * mmap lock in read mode.
+				 */
+				if (vma->anon_vma) {
+					mmap_write_unlock(mm);
+					continue;
+				}
+
 				mmu_notifier_range_init(&range,
 							MMU_NOTIFY_CLEAR, 0,
 							NULL, mm, addr,
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-09-04 14:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-11 13:33 [PATCH] mm/khugepaged: Fix ->anon_vma race Jann Horn
2023-01-12  1:06 ` Yang Shi
2023-01-13 19:36   ` Jann Horn
2023-01-12  8:56 ` Kirill A. Shutemov
2023-01-12 18:12   ` Yang Shi
2023-01-13  0:10     ` Kirill A. Shutemov
2023-01-13  3:22       ` Yang Shi
2023-01-13 19:28   ` Jann Horn
2023-01-15 19:06     ` Kirill A. Shutemov
2023-01-16 12:06       ` Jann Horn
2023-01-16 12:34         ` Kirill A. Shutemov
2023-01-16 12:54           ` Jann Horn
2023-01-16 13:07           ` David Hildenbrand
2023-01-16 13:47             ` Kirill A. Shutemov
2023-01-23 11:07               ` David Hildenbrand
2023-01-24  0:51                 ` Kirill A. Shutemov
2023-01-24 10:19                   ` David Hildenbrand
2023-01-17 18:57       ` Yang Shi
2023-01-17 19:12 ` Jann Horn
2023-01-17 22:55   ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2025-09-04 14:26 [PATCH] mm/khugepaged: fix " Bjoern Doebel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.