All of lore.kernel.org
 help / color / mirror / Atom feed
From: yizhang089@gmail.com
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, david@redhat.com,
	william.kucharski@linux.dev, karol.wachowski@linux.intel.com,
	yi.zhang@huawei.com, yi.zhang@huaweicloud.com,
	yizhang089@gmail.com, liuyongqiang13@huawei.com,
	wangkefeng.wang@huawei.com, yangerkun@huawei.com
Subject: [PATCH v3] mm: do not install PMD mappings when handling a COW fault
Date: Wed, 20 May 2026 11:16:24 -0400	[thread overview]
Message-ID: <20260520151624.78370-1-yizhang089@gmail.com> (raw)

From: Zhang Yi <yi.zhang@huawei.com>

When pinning a page with FOLL_LONGTERM in a CoW VMA and a PMD-aligned
(2MB on x86) large folio follow_page_mask() failed to obtain a valid
anonymous page, resulting in an infinite loop issue. The specific
triggering process is as follows:

1. User call mmap with a 2MB size in MAP_PRIVATE mode for a file that
   has a 2MB large folio installed in the page cache.

   addr = mmap(NULL, 2*1024*1024, PROT_READ, MAP_PRIVATE, file_fd, 0);

2. The kernel driver pass this mapped address to pin_user_pages_fast()
   in FOLL_LONGTERM mode.

   pin_user_pages_fast(addr, 512, FOLL_LONGTERM, pages);

  ->  pin_user_pages_fast()
  |   gup_fast_fallback()
  |    __gup_longterm_locked()
  |     __get_user_pages_locked()
  |      __get_user_pages()
  |       follow_page_mask()
  |        follow_p4d_mask()
  |         follow_pud_mask()
  |          follow_pmd_mask() //pmd_leaf(pmdval) is true because the
  |                            //huge PMD is installed. This is normal
  |                            //in the first round, but it shouldn't
  |                            //happen in the second round.
  |           follow_huge_pmd() //require an anonymous page
  |            return -EMLINK;
  |   faultin_page()
  |    handle_mm_fault()
  |     wp_huge_pmd() //remove PMD and fall back to PTE
  |     handle_pte_fault()
  |      do_pte_missing()
  |       do_fault()
  |        do_read_fault() //FAULT_FLAG_WRITE is not set
  |         finish_fault()
  |          do_set_pmd() //install a huge PMD again, this is wrong!!!
  |      do_wp_page() //create private anonymous pages
  <-    goto retry;

Due to an incorrectly large PMD set in do_read_fault(),
follow_pmd_mask() always returns -EMLINK, causing an infinite loop.

David pointed out that we can preallocate a page table and remap the PMD
to be mapped by a PTE table in wp_huge_pmd() in the future. But now we
can avoid this issue by not installing PMD mappings when handling a COW
and unshare fault in do_set_pmd().

Fixes: a7f226604170 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page")
Reported-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Closes: https://lore.kernel.org/linux-ext4/844e5cd4-462e-4b88-b3b5-816465a3b7e3@linux.intel.com/
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
v2->v3:
 - Update comments to clarify why we shouldn't install PMD mappings
   while doing CoW.

 mm/memory.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index ea6568571131..b1aed4f08224 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5520,6 +5520,17 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct folio *folio, struct page *pa
 	if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
 		return ret;
 
+	/*
+	 * We're about to trigger a write or unshare fault on a CoW
+	 * mapping, breaking the shared folio into private anonymous
+	 * copies at PTE granularity.  A PMD mapping would bind an
+	 * entire PMD-sized range to the shared folio, defeating CoW.
+	 * Fall back to direct PTE mapping.
+	 */
+	if (is_cow_mapping(vma->vm_flags) &&
+	    (vmf->flags & (FAULT_FLAG_WRITE | FAULT_FLAG_UNSHARE)))
+		return ret;
+
 	if (!is_pmd_order(folio_order(folio)))
 		return ret;
 	page = &folio->page;
-- 
2.52.0



             reply	other threads:[~2026-05-20 15:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 15:16 yizhang089 [this message]
2026-05-21  6:26 ` [PATCH v3] mm: do not install PMD mappings when handling a COW fault William Kucharski
2026-05-21  9:39 ` Oscar Salvador (SUSE)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520151624.78370-1-yizhang089@gmail.com \
    --to=yizhang089@gmail.com \
    --cc=david@redhat.com \
    --cc=karol.wachowski@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuyongqiang13@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=william.kucharski@linux.dev \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yi.zhang@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.