From: Lance Yang <lance.yang@linux.dev>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: akpm@linux-foundation.org, david@redhat.com, 21cnbao@gmail.com,
baolin.wang@linux.alibaba.com, chrisl@kernel.org,
kasong@tencent.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com,
ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org,
huang.ying.caritas@gmail.com, zhengtangquan@oppo.com,
riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz,
mingzhe.yang@ly.com, stable@vger.kernel.org,
Barry Song <baohua@kernel.org>, Lance Yang <ioworker0@gmail.com>
Subject: Re: [PATCH v4 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
Date: Mon, 7 Jul 2025 17:13:24 +0800 [thread overview]
Message-ID: <072268ae-3dea-46f8-8c9e-203d062eab82@linux.dev> (raw)
In-Reply-To: <aGtdwn0bLlO2FzZ6@harry>
On 2025/7/7 13:40, Harry Yoo wrote:
> On Tue, Jul 01, 2025 at 10:31:00PM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
>> may read past the end of a PTE table when a large folio's PTE mappings
>> are not fully contained within a single page table.
>>
>> While this scenario might be rare, an issue triggerable from userspace must
>> be fixed regardless of its likelihood. This patch fixes the out-of-bounds
>> access by refactoring the logic into a new helper, folio_unmap_pte_batch().
>>
>> The new helper correctly calculates the safe batch size by capping the scan
>> at both the VMA and PMD boundaries. To simplify the code, it also supports
>> partial batching (i.e., any number of pages from 1 up to the calculated
>> safe maximum), as there is no strong reason to special-case for fully
>> mapped folios.
>>
>> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>>
>> Cc: <stable@vger.kernel.org>
>> Reported-by: David Hildenbrand <david@redhat.com>
>> Closes: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
>> Suggested-by: Barry Song <baohua@kernel.org>
>> Acked-by: Barry Song <baohua@kernel.org>
>> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>
> LGTM,
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Hi Harry,
Thanks for taking time to review!
>
> With a minor comment below.
>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index fb63d9256f09..1320b88fab74 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> hugetlb_remove_rmap(folio);
>> } else {
>> folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
>> - folio_ref_sub(folio, nr_pages - 1);
>> }
>> if (vma->vm_flags & VM_LOCKED)
>> mlock_drain_local();
>> - folio_put(folio);
>> - /* We have already batched the entire folio */
>> - if (nr_pages > 1)
>> + folio_put_refs(folio, nr_pages);
>> +
>> + /*
>> + * If we are sure that we batched the entire folio and cleared
>> + * all PTEs, we can just optimize and stop right here.
>> + */
>> + if (nr_pages == folio_nr_pages(folio))
>> goto walk_done;
>
> Just a minor comment.
>
> We should probably teach page_vma_mapped_walk() to skip nr_pages pages,
> or just rely on next_pte: do { ... } while (pte_none(ptep_get(pvmw->pte)))
> loop in page_vma_mapped_walk() to skip those ptes?
Good point. We handle partially-mapped folios by relying on the "next_pte"
loop to skip those ptes. The common case we expect to handle is fully-mapped
folios.
>
> Taking different paths depending on (nr_pages == folio_nr_pages(folio))
> doesn't seem sensible.
Adding more logic to page_vma_mapped_walk() for the rare partial-folio
case seems like an over-optimization that would complicate the walker.
So, I'd prefer to keep it as is for now ;)
WARNING: multiple messages have this Message-ID (diff)
From: Lance Yang <lance.yang@linux.dev>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: akpm@linux-foundation.org, david@redhat.com, 21cnbao@gmail.com,
baolin.wang@linux.alibaba.com, chrisl@kernel.org,
kasong@tencent.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com,
ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org,
huang.ying.caritas@gmail.com, zhengtangquan@oppo.com,
riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz,
mingzhe.yang@ly.com, stable@vger.kernel.org,
Barry Song <baohua@kernel.org>, Lance Yang <ioworker0@gmail.com>
Subject: Re: [PATCH v4 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
Date: Mon, 7 Jul 2025 17:13:24 +0800 [thread overview]
Message-ID: <072268ae-3dea-46f8-8c9e-203d062eab82@linux.dev> (raw)
In-Reply-To: <aGtdwn0bLlO2FzZ6@harry>
On 2025/7/7 13:40, Harry Yoo wrote:
> On Tue, Jul 01, 2025 at 10:31:00PM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
>> may read past the end of a PTE table when a large folio's PTE mappings
>> are not fully contained within a single page table.
>>
>> While this scenario might be rare, an issue triggerable from userspace must
>> be fixed regardless of its likelihood. This patch fixes the out-of-bounds
>> access by refactoring the logic into a new helper, folio_unmap_pte_batch().
>>
>> The new helper correctly calculates the safe batch size by capping the scan
>> at both the VMA and PMD boundaries. To simplify the code, it also supports
>> partial batching (i.e., any number of pages from 1 up to the calculated
>> safe maximum), as there is no strong reason to special-case for fully
>> mapped folios.
>>
>> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>>
>> Cc: <stable@vger.kernel.org>
>> Reported-by: David Hildenbrand <david@redhat.com>
>> Closes: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
>> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
>> Suggested-by: Barry Song <baohua@kernel.org>
>> Acked-by: Barry Song <baohua@kernel.org>
>> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>
> LGTM,
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Hi Harry,
Thanks for taking time to review!
>
> With a minor comment below.
>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index fb63d9256f09..1320b88fab74 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> hugetlb_remove_rmap(folio);
>> } else {
>> folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
>> - folio_ref_sub(folio, nr_pages - 1);
>> }
>> if (vma->vm_flags & VM_LOCKED)
>> mlock_drain_local();
>> - folio_put(folio);
>> - /* We have already batched the entire folio */
>> - if (nr_pages > 1)
>> + folio_put_refs(folio, nr_pages);
>> +
>> + /*
>> + * If we are sure that we batched the entire folio and cleared
>> + * all PTEs, we can just optimize and stop right here.
>> + */
>> + if (nr_pages == folio_nr_pages(folio))
>> goto walk_done;
>
> Just a minor comment.
>
> We should probably teach page_vma_mapped_walk() to skip nr_pages pages,
> or just rely on next_pte: do { ... } while (pte_none(ptep_get(pvmw->pte)))
> loop in page_vma_mapped_walk() to skip those ptes?
Good point. We handle partially-mapped folios by relying on the "next_pte"
loop to skip those ptes. The common case we expect to handle is fully-mapped
folios.
>
> Taking different paths depending on (nr_pages == folio_nr_pages(folio))
> doesn't seem sensible.
Adding more logic to page_vma_mapped_walk() for the rare partial-folio
case seems like an over-optimization that would complicate the walker.
So, I'd prefer to keep it as is for now ;)
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2025-07-07 9:49 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-01 14:31 [PATCH v4 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap Lance Yang
2025-07-01 14:31 ` Lance Yang
2025-07-01 21:17 ` Andrew Morton
2025-07-01 21:17 ` Andrew Morton
2025-07-02 1:29 ` Lance Yang
2025-07-02 1:29 ` Lance Yang
2025-07-07 5:40 ` Harry Yoo
2025-07-07 5:40 ` Harry Yoo
2025-07-07 9:13 ` Lance Yang [this message]
2025-07-07 9:13 ` Lance Yang
2025-07-07 15:40 ` Barry Song
2025-07-07 15:40 ` Barry Song
2025-07-08 8:19 ` Harry Yoo
2025-07-08 8:19 ` Harry Yoo
2025-07-16 15:21 ` patchwork-bot+linux-riscv
2025-07-16 15:21 ` patchwork-bot+linux-riscv
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=072268ae-3dea-46f8-8c9e-203d062eab82@linux.dev \
--to=lance.yang@linux.dev \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=harry.yoo@oracle.com \
--cc=huang.ying.caritas@gmail.com \
--cc=ioworker0@gmail.com \
--cc=kasong@tencent.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mingzhe.yang@ly.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
--cc=zhengtangquan@oppo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.