* [PATCH 1/1] mm/rmap: make folio unmap batching safe and support partial batches
@ 2025-06-27 2:52 Lance Yang
2025-06-27 5:02 ` Barry Song
2025-07-16 15:21 ` patchwork-bot+linux-riscv
0 siblings, 2 replies; 4+ messages in thread
From: Lance Yang @ 2025-06-27 2:52 UTC (permalink / raw)
To: akpm, david, 21cnbao
Cc: baolin.wang, chrisl, ioworker0, kasong, linux-arm-kernel,
linux-kernel, linux-mm, linux-riscv, lorenzo.stoakes,
ryan.roberts, v-songbaohua, x86, huang.ying.caritas,
zhengtangquan, riel, Liam.Howlett, vbabka, harry.yoo,
mingzhe.yang, Barry Song, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
can read past the end of a PTE table if a large folio is mapped starting at
the last entry of that table.
So let's fix the out-of-bounds read by refactoring the logic into a new
helper, folio_unmap_pte_batch().
The new helper now correctly calculates the safe number of pages to scan by
limiting the operation to the boundaries of the current VMA and the PTE
table.
In addition, the "all-or-nothing" batching restriction is removed to
support partial batches. The reference counting is also cleaned up to use
folio_put_refs().
[1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Barry Song <baohua@kernel.org>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index fb63d9256f09..1320b88fab74 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page,
#endif
}
-/* We support batch unmapping of PTEs for lazyfree large folios */
-static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
- struct folio *folio, pte_t *ptep)
+static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
+ struct page_vma_mapped_walk *pvmw,
+ enum ttu_flags flags, pte_t pte)
{
const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
- int max_nr = folio_nr_pages(folio);
- pte_t pte = ptep_get(ptep);
+ unsigned long end_addr, addr = pvmw->address;
+ struct vm_area_struct *vma = pvmw->vma;
+ unsigned int max_nr;
+
+ if (flags & TTU_HWPOISON)
+ return 1;
+ if (!folio_test_large(folio))
+ return 1;
+ /* We may only batch within a single VMA and a single page table. */
+ end_addr = pmd_addr_end(addr, vma->vm_end);
+ max_nr = (end_addr - addr) >> PAGE_SHIFT;
+
+ /* We only support lazyfree batching for now ... */
if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
- return false;
+ return 1;
if (pte_unused(pte))
- return false;
- if (pte_pfn(pte) != folio_pfn(folio))
- return false;
+ return 1;
- return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
- NULL, NULL) == max_nr;
+ return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags,
+ NULL, NULL, NULL);
}
/*
@@ -2024,9 +2033,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (pte_dirty(pteval))
folio_mark_dirty(folio);
} else if (likely(pte_present(pteval))) {
- if (folio_test_large(folio) && !(flags & TTU_HWPOISON) &&
- can_batch_unmap_folio_ptes(address, folio, pvmw.pte))
- nr_pages = folio_nr_pages(folio);
+ nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
end_addr = address + nr_pages * PAGE_SIZE;
flush_cache_range(vma, address, end_addr);
@@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
hugetlb_remove_rmap(folio);
} else {
folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
- folio_ref_sub(folio, nr_pages - 1);
}
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
- folio_put(folio);
- /* We have already batched the entire folio */
- if (nr_pages > 1)
+ folio_put_refs(folio, nr_pages);
+
+ /*
+ * If we are sure that we batched the entire folio and cleared
+ * all PTEs, we can just optimize and stop right here.
+ */
+ if (nr_pages == folio_nr_pages(folio))
goto walk_done;
continue;
walk_abort:
--
2.49.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] mm/rmap: make folio unmap batching safe and support partial batches
2025-06-27 2:52 [PATCH 1/1] mm/rmap: make folio unmap batching safe and support partial batches Lance Yang
@ 2025-06-27 5:02 ` Barry Song
2025-06-27 6:14 ` Lance Yang
2025-07-16 15:21 ` patchwork-bot+linux-riscv
1 sibling, 1 reply; 4+ messages in thread
From: Barry Song @ 2025-06-27 5:02 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, david, baolin.wang, chrisl, kasong, linux-arm-kernel,
linux-kernel, linux-mm, linux-riscv, lorenzo.stoakes,
ryan.roberts, v-songbaohua, x86, huang.ying.caritas,
zhengtangquan, riel, Liam.Howlett, vbabka, harry.yoo,
mingzhe.yang, Lance Yang
On Fri, Jun 27, 2025 at 2:53 PM Lance Yang <ioworker0@gmail.com> wrote:
>
> From: Lance Yang <lance.yang@linux.dev>
>
> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
> can read past the end of a PTE table if a large folio is mapped starting at
> the last entry of that table.
>
> So let's fix the out-of-bounds read by refactoring the logic into a new
> helper, folio_unmap_pte_batch().
>
> The new helper now correctly calculates the safe number of pages to scan by
> limiting the operation to the boundaries of the current VMA and the PTE
> table.
>
> In addition, the "all-or-nothing" batching restriction is removed to
> support partial batches. The reference counting is also cleaned up to use
> folio_put_refs().
>
> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Barry Song <baohua@kernel.org>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
I'd prefer changing the subject to something like
"Fix potential out-of-bounds page table access during batched unmap"
Supporting partial batching is a cleanup-related benefit of this fix.
It's worth mentioning that the affected cases are quite rare,
since MADV_FREE typically performs split_folio().
Also, we need to Cc stable.
Thanks
Barry
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] mm/rmap: make folio unmap batching safe and support partial batches
2025-06-27 5:02 ` Barry Song
@ 2025-06-27 6:14 ` Lance Yang
0 siblings, 0 replies; 4+ messages in thread
From: Lance Yang @ 2025-06-27 6:14 UTC (permalink / raw)
To: Barry Song
Cc: akpm, david, baolin.wang, chrisl, kasong, linux-arm-kernel,
linux-kernel, linux-mm, linux-riscv, lorenzo.stoakes,
ryan.roberts, v-songbaohua, x86, huang.ying.caritas,
zhengtangquan, riel, Liam.Howlett, vbabka, harry.yoo,
mingzhe.yang, Lance Yang
On 2025/6/27 13:02, Barry Song wrote:
> On Fri, Jun 27, 2025 at 2:53 PM Lance Yang <ioworker0@gmail.com> wrote:
>>
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
>> can read past the end of a PTE table if a large folio is mapped starting at
>> the last entry of that table.
>>
>> So let's fix the out-of-bounds read by refactoring the logic into a new
>> helper, folio_unmap_pte_batch().
>>
>> The new helper now correctly calculates the safe number of pages to scan by
>> limiting the operation to the boundaries of the current VMA and the PTE
>> table.
>>
>> In addition, the "all-or-nothing" batching restriction is removed to
>> support partial batches. The reference counting is also cleaned up to use
>> folio_put_refs().
>>
>> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>>
>> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Barry Song <baohua@kernel.org>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>
> I'd prefer changing the subject to something like
> "Fix potential out-of-bounds page table access during batched unmap"
Yep, that's much better.
>
> Supporting partial batching is a cleanup-related benefit of this fix.
> It's worth mentioning that the affected cases are quite rare,
> since MADV_FREE typically performs split_folio().
Yeah, it would be quite rare in practice ;)
>
> Also, we need to Cc stable.
Thanks! Will do.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] mm/rmap: make folio unmap batching safe and support partial batches
2025-06-27 2:52 [PATCH 1/1] mm/rmap: make folio unmap batching safe and support partial batches Lance Yang
2025-06-27 5:02 ` Barry Song
@ 2025-07-16 15:21 ` patchwork-bot+linux-riscv
1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+linux-riscv @ 2025-07-16 15:21 UTC (permalink / raw)
To: Lance Yang
Cc: linux-riscv, akpm, david, 21cnbao, baolin.wang, chrisl, kasong,
linux-arm-kernel, linux-kernel, linux-mm, lorenzo.stoakes,
ryan.roberts, v-songbaohua, x86, huang.ying.caritas,
zhengtangquan, riel, Liam.Howlett, vbabka, harry.yoo,
mingzhe.yang, baohua, lance.yang
Hello:
This patch was applied to riscv/linux.git (fixes)
by Andrew Morton <akpm@linux-foundation.org>:
On Fri, 27 Jun 2025 10:52:14 +0800 you wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
> can read past the end of a PTE table if a large folio is mapped starting at
> the last entry of that table.
>
> So let's fix the out-of-bounds read by refactoring the logic into a new
> helper, folio_unmap_pte_batch().
>
> [...]
Here is the summary with links:
- [1/1] mm/rmap: make folio unmap batching safe and support partial batches
https://git.kernel.org/riscv/c/ddd05742b45b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-07-16 15:21 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 2:52 [PATCH 1/1] mm/rmap: make folio unmap batching safe and support partial batches Lance Yang
2025-06-27 5:02 ` Barry Song
2025-06-27 6:14 ` Lance Yang
2025-07-16 15:21 ` patchwork-bot+linux-riscv
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).