linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
@ 2025-06-30  1:13 Lance Yang
  2025-06-30 13:39 ` Lorenzo Stoakes
  2025-07-01 14:03 ` David Hildenbrand
  0 siblings, 2 replies; 5+ messages in thread
From: Lance Yang @ 2025-06-30  1:13 UTC (permalink / raw)
  To: akpm, david, 21cnbao
  Cc: baolin.wang, chrisl, ioworker0, kasong, linux-arm-kernel,
	linux-kernel, linux-mm, linux-riscv, lorenzo.stoakes,
	ryan.roberts, v-songbaohua, x86, huang.ying.caritas,
	zhengtangquan, riel, Liam.Howlett, vbabka, harry.yoo,
	mingzhe.yang, stable, Barry Song, Lance Yang

From: Lance Yang <lance.yang@linux.dev>

As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
may read past the end of a PTE table when a large folio's PTE mappings
are not fully contained within a single page table.

While this scenario might be rare, an issue triggerable from userspace must
be fixed regardless of its likelihood. This patch fixes the out-of-bounds
access by refactoring the logic into a new helper, folio_unmap_pte_batch().

The new helper correctly calculates the safe batch size by capping the scan
at both the VMA and PMD boundaries. To simplify the code, it also supports
partial batching (i.e., any number of pages from 1 up to the calculated
safe maximum), as there is no strong reason to special-case for fully
mapped folios.

[1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com

Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
Cc: <stable@vger.kernel.org>
Acked-by: Barry Song <baohua@kernel.org>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Barry Song <baohua@kernel.org>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
v2 -> v3:
 - Tweak changelog (per Barry and David)
 - Pick AB from Barry - thanks!
 - https://lore.kernel.org/linux-mm/20250627062319.84936-1-lance.yang@linux.dev

v1 -> v2:
 - Update subject and changelog (per Barry)
 - https://lore.kernel.org/linux-mm/20250627025214.30887-1-lance.yang@linux.dev

 mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index fb63d9256f09..1320b88fab74 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page,
 #endif
 }
 
-/* We support batch unmapping of PTEs for lazyfree large folios */
-static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
-			struct folio *folio, pte_t *ptep)
+static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
+			struct page_vma_mapped_walk *pvmw,
+			enum ttu_flags flags, pte_t pte)
 {
 	const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
-	int max_nr = folio_nr_pages(folio);
-	pte_t pte = ptep_get(ptep);
+	unsigned long end_addr, addr = pvmw->address;
+	struct vm_area_struct *vma = pvmw->vma;
+	unsigned int max_nr;
+
+	if (flags & TTU_HWPOISON)
+		return 1;
+	if (!folio_test_large(folio))
+		return 1;
 
+	/* We may only batch within a single VMA and a single page table. */
+	end_addr = pmd_addr_end(addr, vma->vm_end);
+	max_nr = (end_addr - addr) >> PAGE_SHIFT;
+
+	/* We only support lazyfree batching for now ... */
 	if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
-		return false;
+		return 1;
 	if (pte_unused(pte))
-		return false;
-	if (pte_pfn(pte) != folio_pfn(folio))
-		return false;
+		return 1;
 
-	return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
-			       NULL, NULL) == max_nr;
+	return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags,
+			       NULL, NULL, NULL);
 }
 
 /*
@@ -2024,9 +2033,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			if (pte_dirty(pteval))
 				folio_mark_dirty(folio);
 		} else if (likely(pte_present(pteval))) {
-			if (folio_test_large(folio) && !(flags & TTU_HWPOISON) &&
-			    can_batch_unmap_folio_ptes(address, folio, pvmw.pte))
-				nr_pages = folio_nr_pages(folio);
+			nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
 			end_addr = address + nr_pages * PAGE_SIZE;
 			flush_cache_range(vma, address, end_addr);
 
@@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			hugetlb_remove_rmap(folio);
 		} else {
 			folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
-			folio_ref_sub(folio, nr_pages - 1);
 		}
 		if (vma->vm_flags & VM_LOCKED)
 			mlock_drain_local();
-		folio_put(folio);
-		/* We have already batched the entire folio */
-		if (nr_pages > 1)
+		folio_put_refs(folio, nr_pages);
+
+		/*
+		 * If we are sure that we batched the entire folio and cleared
+		 * all PTEs, we can just optimize and stop right here.
+		 */
+		if (nr_pages == folio_nr_pages(folio))
 			goto walk_done;
 		continue;
 walk_abort:
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
  2025-06-30  1:13 [PATCH v3 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap Lance Yang
@ 2025-06-30 13:39 ` Lorenzo Stoakes
  2025-06-30 13:41   ` David Hildenbrand
  2025-07-01 14:03 ` David Hildenbrand
  1 sibling, 1 reply; 5+ messages in thread
From: Lorenzo Stoakes @ 2025-06-30 13:39 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, 21cnbao, baolin.wang, chrisl, kasong,
	linux-arm-kernel, linux-kernel, linux-mm, linux-riscv,
	ryan.roberts, v-songbaohua, x86, huang.ying.caritas,
	zhengtangquan, riel, Liam.Howlett, vbabka, harry.yoo,
	mingzhe.yang, stable, Barry Song, Lance Yang

On Mon, Jun 30, 2025 at 09:13:05AM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
> may read past the end of a PTE table when a large folio's PTE mappings
> are not fully contained within a single page table.
>
> While this scenario might be rare, an issue triggerable from userspace must
> be fixed regardless of its likelihood. This patch fixes the out-of-bounds
> access by refactoring the logic into a new helper, folio_unmap_pte_batch().
>
> The new helper correctly calculates the safe batch size by capping the scan
> at both the VMA and PMD boundaries. To simplify the code, it also supports
> partial batching (i.e., any number of pages from 1 up to the calculated
> safe maximum), as there is no strong reason to special-case for fully
> mapped folios.
>
> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Cc: <stable@vger.kernel.org>
> Acked-by: Barry Song <baohua@kernel.org>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Barry Song <baohua@kernel.org>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>

This LGTM:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
> v2 -> v3:
>  - Tweak changelog (per Barry and David)
>  - Pick AB from Barry - thanks!
>  - https://lore.kernel.org/linux-mm/20250627062319.84936-1-lance.yang@linux.dev
>
> v1 -> v2:
>  - Update subject and changelog (per Barry)
>  - https://lore.kernel.org/linux-mm/20250627025214.30887-1-lance.yang@linux.dev
>
>  mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------
>  1 file changed, 28 insertions(+), 18 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index fb63d9256f09..1320b88fab74 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page,
>  #endif
>  }
>
> -/* We support batch unmapping of PTEs for lazyfree large folios */
> -static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
> -			struct folio *folio, pte_t *ptep)
> +static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
> +			struct page_vma_mapped_walk *pvmw,
> +			enum ttu_flags flags, pte_t pte)
>  {
>  	const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
> -	int max_nr = folio_nr_pages(folio);
> -	pte_t pte = ptep_get(ptep);
> +	unsigned long end_addr, addr = pvmw->address;
> +	struct vm_area_struct *vma = pvmw->vma;
> +	unsigned int max_nr;
> +
> +	if (flags & TTU_HWPOISON)
> +		return 1;
> +	if (!folio_test_large(folio))
> +		return 1;
>
> +	/* We may only batch within a single VMA and a single page table. */
> +	end_addr = pmd_addr_end(addr, vma->vm_end);
> +	max_nr = (end_addr - addr) >> PAGE_SHIFT;
> +
> +	/* We only support lazyfree batching for now ... */
>  	if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
> -		return false;
> +		return 1;
>  	if (pte_unused(pte))
> -		return false;
> -	if (pte_pfn(pte) != folio_pfn(folio))
> -		return false;
> +		return 1;
>
> -	return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
> -			       NULL, NULL) == max_nr;
> +	return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags,
> +			       NULL, NULL, NULL);

I guess this will conflict with David's changes, but maybe in this simpler case
and given this was existing code a bit less? Anyway let's see.

>  }
>
>  /*
> @@ -2024,9 +2033,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			if (pte_dirty(pteval))
>  				folio_mark_dirty(folio);
>  		} else if (likely(pte_present(pteval))) {
> -			if (folio_test_large(folio) && !(flags & TTU_HWPOISON) &&
> -			    can_batch_unmap_folio_ptes(address, folio, pvmw.pte))
> -				nr_pages = folio_nr_pages(folio);
> +			nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
>  			end_addr = address + nr_pages * PAGE_SIZE;
>  			flush_cache_range(vma, address, end_addr);
>
> @@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			hugetlb_remove_rmap(folio);
>  		} else {
>  			folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
> -			folio_ref_sub(folio, nr_pages - 1);
>  		}
>  		if (vma->vm_flags & VM_LOCKED)
>  			mlock_drain_local();
> -		folio_put(folio);
> -		/* We have already batched the entire folio */
> -		if (nr_pages > 1)
> +		folio_put_refs(folio, nr_pages);
> +
> +		/*
> +		 * If we are sure that we batched the entire folio and cleared
> +		 * all PTEs, we can just optimize and stop right here.
> +		 */
> +		if (nr_pages == folio_nr_pages(folio))
>  			goto walk_done;
>  		continue;
>  walk_abort:
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
  2025-06-30 13:39 ` Lorenzo Stoakes
@ 2025-06-30 13:41   ` David Hildenbrand
  0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2025-06-30 13:41 UTC (permalink / raw)
  To: Lorenzo Stoakes, Lance Yang
  Cc: akpm, 21cnbao, baolin.wang, chrisl, kasong, linux-arm-kernel,
	linux-kernel, linux-mm, linux-riscv, ryan.roberts, v-songbaohua,
	x86, huang.ying.caritas, zhengtangquan, riel, Liam.Howlett,
	vbabka, harry.yoo, mingzhe.yang, stable, Barry Song, Lance Yang

On 30.06.25 15:39, Lorenzo Stoakes wrote:
> On Mon, Jun 30, 2025 at 09:13:05AM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
>> may read past the end of a PTE table when a large folio's PTE mappings
>> are not fully contained within a single page table.
>>
>> While this scenario might be rare, an issue triggerable from userspace must
>> be fixed regardless of its likelihood. This patch fixes the out-of-bounds
>> access by refactoring the logic into a new helper, folio_unmap_pte_batch().
>>
>> The new helper correctly calculates the safe batch size by capping the scan
>> at both the VMA and PMD boundaries. To simplify the code, it also supports
>> partial batching (i.e., any number of pages from 1 up to the calculated
>> safe maximum), as there is no strong reason to special-case for fully
>> mapped folios.
>>
>> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>>
>> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
>> Cc: <stable@vger.kernel.org>
>> Acked-by: Barry Song <baohua@kernel.org>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Barry Song <baohua@kernel.org>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> 
> This LGTM:
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> 
>> ---
>> v2 -> v3:
>>   - Tweak changelog (per Barry and David)
>>   - Pick AB from Barry - thanks!
>>   - https://lore.kernel.org/linux-mm/20250627062319.84936-1-lance.yang@linux.dev
>>
>> v1 -> v2:
>>   - Update subject and changelog (per Barry)
>>   - https://lore.kernel.org/linux-mm/20250627025214.30887-1-lance.yang@linux.dev
>>
>>   mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------
>>   1 file changed, 28 insertions(+), 18 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index fb63d9256f09..1320b88fab74 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page,
>>   #endif
>>   }
>>
>> -/* We support batch unmapping of PTEs for lazyfree large folios */
>> -static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
>> -			struct folio *folio, pte_t *ptep)
>> +static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>> +			struct page_vma_mapped_walk *pvmw,
>> +			enum ttu_flags flags, pte_t pte)
>>   {
>>   	const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
>> -	int max_nr = folio_nr_pages(folio);
>> -	pte_t pte = ptep_get(ptep);
>> +	unsigned long end_addr, addr = pvmw->address;
>> +	struct vm_area_struct *vma = pvmw->vma;
>> +	unsigned int max_nr;
>> +
>> +	if (flags & TTU_HWPOISON)
>> +		return 1;
>> +	if (!folio_test_large(folio))
>> +		return 1;
>>
>> +	/* We may only batch within a single VMA and a single page table. */
>> +	end_addr = pmd_addr_end(addr, vma->vm_end);
>> +	max_nr = (end_addr - addr) >> PAGE_SHIFT;
>> +
>> +	/* We only support lazyfree batching for now ... */
>>   	if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
>> -		return false;
>> +		return 1;
>>   	if (pte_unused(pte))
>> -		return false;
>> -	if (pte_pfn(pte) != folio_pfn(folio))
>> -		return false;
>> +		return 1;
>>
>> -	return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
>> -			       NULL, NULL) == max_nr;
>> +	return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags,
>> +			       NULL, NULL, NULL);
> 
> I guess this will conflict with David's changes, but maybe in this simpler case
> and given this was existing code a bit less? Anyway let's see.

It's a fix, so this is fine to go in first. (I already rebased on top of 
mm/mm-new where this is in)

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
  2025-06-30  1:13 [PATCH v3 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap Lance Yang
  2025-06-30 13:39 ` Lorenzo Stoakes
@ 2025-07-01 14:03 ` David Hildenbrand
  2025-07-01 14:15   ` Lance Yang
  1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2025-07-01 14:03 UTC (permalink / raw)
  To: Lance Yang, akpm, 21cnbao
  Cc: baolin.wang, chrisl, kasong, linux-arm-kernel, linux-kernel,
	linux-mm, linux-riscv, lorenzo.stoakes, ryan.roberts,
	v-songbaohua, x86, huang.ying.caritas, zhengtangquan, riel,
	Liam.Howlett, vbabka, harry.yoo, mingzhe.yang, stable, Barry Song,
	Lance Yang

On 30.06.25 03:13, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
> may read past the end of a PTE table when a large folio's PTE mappings
> are not fully contained within a single page table.
> 
> While this scenario might be rare, an issue triggerable from userspace must
> be fixed regardless of its likelihood. This patch fixes the out-of-bounds
> access by refactoring the logic into a new helper, folio_unmap_pte_batch().
> 
> The new helper correctly calculates the safe batch size by capping the scan
> at both the VMA and PMD boundaries. To simplify the code, it also supports
> partial batching (i.e., any number of pages from 1 up to the calculated
> safe maximum), as there is no strong reason to special-case for fully
> mapped folios.
> 
> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
> 
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Cc: <stable@vger.kernel.org>
> Acked-by: Barry Song <baohua@kernel.org>
> Suggested-by: David Hildenbrand <david@redhat.com>

Realized this now: This should probably be a "Reported-by:" with the 
"Closes:" and and a link to my mail.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
  2025-07-01 14:03 ` David Hildenbrand
@ 2025-07-01 14:15   ` Lance Yang
  0 siblings, 0 replies; 5+ messages in thread
From: Lance Yang @ 2025-07-01 14:15 UTC (permalink / raw)
  To: David Hildenbrand, akpm, 21cnbao
  Cc: baolin.wang, chrisl, kasong, linux-arm-kernel, linux-kernel,
	linux-mm, linux-riscv, lorenzo.stoakes, ryan.roberts,
	v-songbaohua, x86, huang.ying.caritas, zhengtangquan, riel,
	Liam.Howlett, vbabka, harry.yoo, mingzhe.yang, stable, Barry Song,
	Lance Yang



On 2025/7/1 22:03, David Hildenbrand wrote:
> On 30.06.25 03:13, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
>> may read past the end of a PTE table when a large folio's PTE mappings
>> are not fully contained within a single page table.
>>
>> While this scenario might be rare, an issue triggerable from userspace 
>> must
>> be fixed regardless of its likelihood. This patch fixes the out-of-bounds
>> access by refactoring the logic into a new helper, 
>> folio_unmap_pte_batch().
>>
>> The new helper correctly calculates the safe batch size by capping the 
>> scan
>> at both the VMA and PMD boundaries. To simplify the code, it also 
>> supports
>> partial batching (i.e., any number of pages from 1 up to the calculated
>> safe maximum), as there is no strong reason to special-case for fully
>> mapped folios.
>>
>> [1] https://lore.kernel.org/linux-mm/ 
>> a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>>
>> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large 
>> folios during reclamation")
>> Cc: <stable@vger.kernel.org>
>> Acked-by: Barry Song <baohua@kernel.org>
>> Suggested-by: David Hildenbrand <david@redhat.com>
> 
> Realized this now: This should probably be a "Reported-by:" with the 
> "Closes:" and and a link to my mail.

Got it. Both tags (Reported-by/Closes) will be in the next commit ;)



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-01 16:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30  1:13 [PATCH v3 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap Lance Yang
2025-06-30 13:39 ` Lorenzo Stoakes
2025-06-30 13:41   ` David Hildenbrand
2025-07-01 14:03 ` David Hildenbrand
2025-07-01 14:15   ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).