From: Dev Jain <dev.jain@arm.com>
To: akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org,
chrisl@kernel.org, kasong@tencent.com, hughd@google.com,
liam@infradead.org
Cc: Dev Jain <dev.jain@arm.com>,
riel@surriel.com, vbabka@kernel.org, harry@kernel.org,
jannh@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, qi.zheng@linux.dev, shakeel.butt@linux.dev,
baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com,
weixugc@google.com, shikemeng@huaweicloud.com, nphamcs@gmail.com,
bhe@redhat.com, youngjun.park@lge.com,
baolin.wang@linux.alibaba.com, pfalcato@suse.de,
ryan.roberts@arm.com, anshuman.khandual@arm.com
Subject: [PATCH v4 02/12] mm/rmap: Add try_to_unmap_hugetlb_one
Date: Tue, 26 May 2026 12:06:25 +0530 [thread overview]
Message-ID: <20260526063635.61721-3-dev.jain@arm.com> (raw)
In-Reply-To: <20260526063635.61721-1-dev.jain@arm.com>
Simplify try_to_unmap_one by separating out the hugetlb parts into
try_to_unmap_hugetlb_one.
To understand the correctness of the refactoring, the following points
are noted:
1. try_to_unmap() is called for hugetlb folios only when they are
hwpoisoned.
2. A hugetlb VMA cannot be mlocked.
3. The pvmw API sets pvmw.pte to the base of the hugetlb folio (pvmw.pmd
is NULL).
4. We won't ever process a softleaf entry that encodes a hugetlb folio;
hugetlb folios are never swapped out, migration entries will be skipped
(PVMW_MIGRATION not passed) and device-exclusive does not work for
hugetlb.
5. uffd-wp bit is lost when converting pvmw.pte to hwpoison softleaf
(therefore no need to call pte_install_uffd_wp_if_needed after
clearing pvmw.pte)
6. TTU_HWPOISON is always present; for it to not be present, either folio
has to be in swapcache, or mapping_can_writeback() is true (see
unmap_poisoned_folio), none of which is true for hugetlb folios.
7. hugetlb uses separate counters from normal rss counters, therefore
update_highwater_rss() need not be called.
While at it:
- Change VM_BUG_* to VM_WARN_*.
- Do not declare variables which are only used once
- Assert that the subpage derived by the pvmw walk is exactly the head
page. This is because try_to_unmap() does not remember the specific
subpage which was hwpoisoned, and, since we cannot munmap/mremap
across a hugetlb folio partially, the first pte mapping the hugetlb
folio (in case of a contpte or contpmd mapped folio) cannot ever point
to an intermediate page.
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
mm/rmap.c | 203 ++++++++++++++++++++++++++++++++----------------------
1 file changed, 121 insertions(+), 82 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 430c91c8fe2ae..06ab1158d4cd1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1978,6 +1978,121 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY);
}
+static bool __try_to_unmap_hugetlb_one(struct folio *folio,
+ struct vm_area_struct *vma, struct page_vma_mapped_walk *pvmw,
+ struct mmu_notifier_range *range, enum ttu_flags flags)
+{
+ unsigned long hsz = huge_page_size(hstate_vma(vma));
+ unsigned long address = pvmw->address;
+ struct mm_struct *mm = vma->vm_mm;
+ struct page *subpage;
+ bool ret = true;
+ pte_t pteval;
+
+ if (!page_vma_mapped_walk(pvmw))
+ return true;
+
+ pteval = ptep_get(pvmw->pte);
+ VM_WARN_ON(!pte_present(pteval));
+ subpage = folio_page(folio, pte_pfn(pteval) - folio_pfn(folio));
+ VM_WARN_ON(folio_page(folio, 0) != subpage);
+
+ /*
+ * huge_pmd_unshare may unmap an entire PMD page. There is no way of
+ * knowing exactly which PMDs may be cached for this mm, so we must
+ * flush them all. start/end were already adjusted above to cover this
+ * range.
+ */
+ flush_cache_range(vma, range->start, range->end);
+
+ /*
+ * To call huge_pmd_unshare, i_mmap_rwsem must be held in write mode.
+ * Caller needs to explicitly do this outside rmap routines.
+ *
+ * We also must hold hugetlb vma_lock in write mode. Lock order dictates
+ * acquiring vma_lock BEFORE i_mmap_rwsem. We can only try lock here and
+ * fail if unsuccessful.
+ */
+ if (!folio_test_anon(folio)) {
+ struct mmu_gather tlb;
+
+ VM_WARN_ON(!(flags & TTU_RMAP_LOCKED));
+ if (!hugetlb_vma_trylock_write(vma)) {
+ ret = false;
+ goto walk_done;
+ }
+
+ tlb_gather_mmu_vma(&tlb, vma);
+ if (huge_pmd_unshare(&tlb, vma, address, pvmw->pte)) {
+ hugetlb_vma_unlock_write(vma);
+ huge_pmd_unshare_flush(&tlb, vma);
+ tlb_finish_mmu(&tlb);
+ /*
+ * The PMD table was unmapped, consequently unmapping
+ * the folio.
+ */
+ goto walk_done;
+ }
+ hugetlb_vma_unlock_write(vma);
+ tlb_finish_mmu(&tlb);
+ }
+ pteval = huge_ptep_clear_flush(vma, address, pvmw->pte);
+ if (pte_dirty(pteval))
+ folio_mark_dirty(folio);
+
+ VM_WARN_ON(!(flags & TTU_HWPOISON));
+ pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
+ hugetlb_count_sub(folio_nr_pages(folio), mm);
+ set_huge_pte_at(mm, address, pvmw->pte, pteval, hsz);
+ hugetlb_remove_rmap(folio);
+ folio_put_refs(folio, 1);
+
+walk_done:
+ page_vma_mapped_walk_done(pvmw);
+ return ret;
+}
+
+static bool try_to_unmap_hugetlb_one(struct folio *folio,
+ struct vm_area_struct *vma, unsigned long address, void *arg)
+{
+ DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
+ struct mmu_notifier_range range;
+ enum ttu_flags flags = (enum ttu_flags)(long)arg;
+ bool ret;
+
+ /*
+ * The try_to_unmap() is only passed a hugetlb folio in the case
+ * where the hugetlb folio contains a poisoned page.
+ */
+ VM_WARN_ON_FOLIO(!folio_contain_hwpoisoned_page(folio), folio);
+
+ /*
+ * When racing against e.g. zap_pte_range() on another cpu,
+ * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
+ * try_to_unmap() may return before folio_mapped() has become false,
+ * if page table locking is skipped: use TTU_SYNC to wait for that.
+ */
+ if (flags & TTU_SYNC)
+ pvmw.flags = PVMW_SYNC;
+
+ /*
+ * For hugetlb, it could be much worse than THP if we need pud
+ * invalidation in the case of pmd sharing.
+ *
+ * Note that the folio can not be freed in this function as call of
+ * try_to_unmap() must hold a reference on the folio.
+ */
+ range.end = vma_address_end(&pvmw);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
+ address, range.end);
+ adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
+ mmu_notifier_invalidate_range_start(&range);
+ ret = __try_to_unmap_hugetlb_one(folio, vma, &pvmw, &range,
+ flags);
+ mmu_notifier_invalidate_range_end(&range);
+ return ret;
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
@@ -1993,7 +2108,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
enum ttu_flags flags = (enum ttu_flags)(long)arg;
unsigned long nr_pages = 1, end_addr;
unsigned long pfn;
- unsigned long hsz = 0;
int ptes = 0;
/*
@@ -2007,8 +2121,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
/*
* For THP, we have to assume the worse case ie pmd for invalidation.
- * For hugetlb, it could be much worse if we need to do pud
- * invalidation in the case of pmd sharing.
*
* Note that the folio can not be freed in this function as call of
* try_to_unmap() must hold a reference on the folio.
@@ -2016,17 +2128,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
range.end = vma_address_end(&pvmw);
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
address, range.end);
- if (folio_test_hugetlb(folio)) {
- /*
- * If sharing is possible, start and end will be adjusted
- * accordingly.
- */
- adjust_range_if_pmd_sharing_possible(vma, &range.start,
- &range.end);
-
- /* We need the huge page size for set_huge_pte_at() */
- hsz = huge_page_size(hstate_vma(vma));
- }
mmu_notifier_invalidate_range_start(&range);
while (page_vma_mapped_walk(&pvmw)) {
@@ -2104,7 +2205,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
const softleaf_t entry = softleaf_from_pte(pteval);
pfn = softleaf_to_pfn(entry);
- VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
}
subpage = folio_page(folio, pfn - folio_pfn(folio));
@@ -2112,59 +2212,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
anon_exclusive = folio_test_anon(folio) &&
PageAnonExclusive(subpage);
- if (folio_test_hugetlb(folio)) {
- bool anon = folio_test_anon(folio);
-
- /*
- * The try_to_unmap() is only passed a hugetlb folio
- * in the case where the hugetlb folio contains a
- * poisoned page.
- */
- VM_WARN_ON_FOLIO(!folio_contain_hwpoisoned_page(folio), folio);
- /*
- * huge_pmd_unshare may unmap an entire PMD page.
- * There is no way of knowing exactly which PMDs may
- * be cached for this mm, so we must flush them all.
- * start/end were already adjusted above to cover this
- * range.
- */
- flush_cache_range(vma, range.start, range.end);
-
- /*
- * To call huge_pmd_unshare, i_mmap_rwsem must be
- * held in write mode. Caller needs to explicitly
- * do this outside rmap routines.
- *
- * We also must hold hugetlb vma_lock in write mode.
- * Lock order dictates acquiring vma_lock BEFORE
- * i_mmap_rwsem. We can only try lock here and fail
- * if unsuccessful.
- */
- if (!anon) {
- struct mmu_gather tlb;
-
- VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
- if (!hugetlb_vma_trylock_write(vma))
- goto walk_abort;
-
- tlb_gather_mmu_vma(&tlb, vma);
- if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
- hugetlb_vma_unlock_write(vma);
- huge_pmd_unshare_flush(&tlb, vma);
- tlb_finish_mmu(&tlb);
- /*
- * The PMD table was unmapped,
- * consequently unmapping the folio.
- */
- goto walk_done;
- }
- hugetlb_vma_unlock_write(vma);
- tlb_finish_mmu(&tlb);
- }
- pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
- if (pte_dirty(pteval))
- folio_mark_dirty(folio);
- } else if (likely(pte_present(pteval))) {
+ if (likely(pte_present(pteval))) {
nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
end_addr = address + nr_pages * PAGE_SIZE;
flush_cache_range(vma, address, end_addr);
@@ -2201,14 +2249,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (folio_contain_hwpoisoned_page(folio) && (flags & TTU_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
- if (folio_test_hugetlb(folio)) {
- hugetlb_count_sub(folio_nr_pages(folio), mm);
- set_huge_pte_at(mm, address, pvmw.pte, pteval,
- hsz);
- } else {
- dec_mm_counter(mm, mm_counter(folio));
- set_pte_at(mm, address, pvmw.pte, pteval);
- }
+ dec_mm_counter(mm, mm_counter(folio));
+ set_pte_at(mm, address, pvmw.pte, pteval);
} else if (likely(pte_present(pteval)) && pte_unused(pteval) &&
!userfaultfd_armed(vma)) {
/*
@@ -2341,11 +2383,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
}
discard:
- if (unlikely(folio_test_hugetlb(folio))) {
- hugetlb_remove_rmap(folio);
- } else {
- folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
- }
+ folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put_refs(folio, nr_pages);
@@ -2393,7 +2431,8 @@ static int folio_not_mapped(struct folio *folio)
void try_to_unmap(struct folio *folio, enum ttu_flags flags)
{
struct rmap_walk_control rwc = {
- .rmap_one = try_to_unmap_one,
+ .rmap_one = folio_test_hugetlb(folio) ?
+ try_to_unmap_hugetlb_one : try_to_unmap_one,
.arg = (void *)flags,
.done = folio_not_mapped,
.anon_lock = folio_lock_anon_vma_read,
--
2.34.1
next prev parent reply other threads:[~2026-05-26 6:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 6:36 [PATCH v4 00/12] Optimize anonymous large folio unmapping Dev Jain
2026-05-26 6:36 ` [PATCH v4 01/12] mm/rmap: convert page -> folio for hwpoison checks Dev Jain
2026-05-26 6:36 ` Dev Jain [this message]
2026-05-26 6:36 ` [PATCH v4 03/12] mm/rmap: refactor some code around lazyfree folio unmapping Dev Jain
2026-05-26 6:36 ` [PATCH v4 04/12] mm/memory: Batch set uffd-wp markers during zapping Dev Jain
2026-05-26 6:36 ` [PATCH v4 05/12] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Dev Jain
2026-05-26 6:36 ` [PATCH v4 06/12] mm/swap: rename subpage->page in folio_dup_swap/folio_put_swap Dev Jain
2026-05-26 6:36 ` [PATCH v4 07/12] mm/swapfile: Add batched version of folio_dup_swap Dev Jain
2026-05-26 6:36 ` [PATCH v4 08/12] mm/swapfile: Add batched version of folio_put_swap Dev Jain
2026-05-26 6:36 ` [PATCH v4 09/12] mm/rmap: Add batched version of folio_try_share_anon_rmap_pte Dev Jain
2026-05-26 6:36 ` [PATCH v4 10/12] mm/rmap: refactor anon folio unmap in try_to_unmap_one Dev Jain
2026-05-26 6:36 ` [PATCH v4 11/12] mm/mprotect: drop 'sub' from page_anon_exclusive_sub_batch Dev Jain
2026-05-26 6:36 ` [PATCH v4 12/12] mm/rmap: enable batch unmapping of anonymous folios Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260526063635.61721-3-dev.jain@arm.com \
--to=dev.jain@arm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=harry@kernel.org \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kasong@tencent.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=nphamcs@gmail.com \
--cc=pfalcato@suse.de \
--cc=qi.zheng@linux.dev \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=youngjun.park@lge.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox