From: Lance Yang <lance.yang@linux.dev>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
baohua@kernel.org, ioworker0@gmail.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Wei Yang <richard.weiyang@gmail.com>
Subject: Re: [PATCH mm-new v2 1/1] mm/khugepaged: guard is_zero_pfn() calls with pte_present()
Date: Sat, 18 Oct 2025 00:33:33 +0800 [thread overview]
Message-ID: <b5627e83-489c-4e16-910c-fe7e56912793@linux.dev> (raw)
In-Reply-To: <699b143a-cca4-486c-a4ad-d0be561d4ab2@lucifer.local>
On 2025/10/17 23:44, Lorenzo Stoakes wrote:
> On Fri, Oct 17, 2025 at 05:38:47PM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> A non-present entry, like a swap PTE, contains completely different data
>> (swap type and offset). pte_pfn() doesn't know this, so if we feed it a
>> non-present entry, it will spit out a junk PFN.
>
> It feels like this somewhat contradicts points I've made on the original series
> re the is_swap_pte() stuff. Sigh.
My bad. I didn't get your point before ...
And this patch is not intended to touch is_swap_pte() ...
>
> I guess that's _such a mess_ it's hard to avoid though.
>
> And I guess it's reasonable that !pte_present() means we can't expect a valid
> PFN though.
Yes, I think we expect a valid PFN must be under pte_present().
>
>>
>> What if that junk PFN happens to match the zeropage's PFN by sheer
>> chance? While really unlikely, this would be really bad if it did.
>>
>> So, let's fix this potential bug by ensuring all calls to is_zero_pfn()
>> in khugepaged.c are properly guarded by a pte_present() check.
>>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Not sure I really suggested something that strictly contradicts points I
> made... but I guess I did suggest guarding this stuff more carefully.
Sorry, I didn't catch you again ... Will drop the Suggested-by tag.
>
>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> Applies against commit 0f22abd9096e in mm-new.
>>
>> v1 -> v2:
>> - Collect Reviewed-by from Dev, Wei and Baolin - thanks!
>> - Reduce a level of indentation (per Dev)
>> - https://lore.kernel.org/linux-mm/20251016033643.10848-1-lance.yang@linux.dev/
>>
>> mm/khugepaged.c | 29 ++++++++++++++++-------------
>> 1 file changed, 16 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d635d821f611..648d9335de00 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -516,7 +516,7 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
>> pte_t pteval = ptep_get(_pte);
>> unsigned long pfn;
>>
>> - if (pte_none(pteval))
>> + if (!pte_present(pteval))
>> continue;
>> pfn = pte_pfn(pteval);
>> if (is_zero_pfn(pfn))
>> @@ -690,17 +690,18 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
>> address += nr_ptes * PAGE_SIZE) {
>> nr_ptes = 1;
>> pteval = ptep_get(_pte);
>> - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> + if (pte_none(pteval) ||
>> + (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) {
>> add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
>> - if (is_zero_pfn(pte_pfn(pteval))) {
>> - /*
>> - * ptl mostly unnecessary.
>> - */
>> - spin_lock(ptl);
>> - ptep_clear(vma->vm_mm, address, _pte);
>> - spin_unlock(ptl);
>> - ksm_might_unmap_zero_page(vma->vm_mm, pteval);
>> - }
>> + if (pte_none(pteval))
>> + continue;
>
> Yeah I'm not sure I really love this refactoring.
>
> Can be:
>
> if (!is_swap_pte(pteval)) {
> add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
> if (!is_zero_pfn(pte_pfn(pteval)))
> continue;
>
> ...
> }
>
> Doing pte_pfn() on a pte_none() PTE is fine.
>
> Obviously as theree's a lot of hate for is_swap_pte() you could also do:
>
> if (pte_none(pteval) || pte_present(pteval)) {
> ...
> }
>
> Which literally open-codes !is_swap_pte().
>
> At the same time, this makes very clear that PTE none is OK.
Emm, I'd prefer the new helper pte_none_or_zero() here:
if (pte_none_or_zero(pteval)) {
add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
if (pte_none(pteval))
continue;
....
}
That looks really clean and simple for me ;)
>
>> + /*
>> + * ptl mostly unnecessary.
>> + */
>> + spin_lock(ptl);
>> + ptep_clear(vma->vm_mm, address, _pte);
>> + spin_unlock(ptl);
>> + ksm_might_unmap_zero_page(vma->vm_mm, pteval);
>> } else {
>> struct page *src_page = pte_page(pteval);
>>
>> @@ -794,7 +795,8 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio,
>> unsigned long src_addr = address + i * PAGE_SIZE;
>> struct page *src_page;
>>
>> - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> + if (pte_none(pteval) ||
>> + (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) {
>> clear_user_highpage(page, src_addr);
>> continue;
>> }
>> @@ -1294,7 +1296,8 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> goto out_unmap;
>> }
>> }
>> - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> + if (pte_none(pteval) ||
>> + (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) {
>> ++none_or_zero;
>> if (!userfaultfd_armed(vma) &&
>> (!cc->is_khugepaged ||
>> --
>> 2.49.0
>>
>
> I mean all of this seems super gross anyway. We're constantly open-coding the
> same check over and over again.
>
> static inline bool pte_is_none_or_zero(pte_t pteval)
> {
> if (is_swap_pte(pteval))
> return false;
>
> return is_zero_pfn(pte_pfn(pteval));
> }
>
> Put somewhere in a relevant header file.
>
> Or again, if there's distaste at is_swap_pte(), and here maybe it's more valid
> not to use it (given name of function).
>
> static inline bool pte_is_none_or_zero(pte_t pteval)
> {
> /* Non-present entries do not have a PFN to check. */
> if (!pte_present(pteval))
> return false;
>
> if (pte_none(pteval))
> return true;
>
> return is_zero_pfn(pte_pfn(pteval));
> }
Yeah, I'll put pte_none_or_zero() in this file first.
static inline bool pte_none_or_zero(pte_t pte)
{
if (pte_none(pte))
return true;
return pte_present(pte) && is_zero_pfn(pte_pfn(pte));
}
>
> I think I'm going to do a series to addres the is_swap_pte() mess actually, as
> this whole thing is very frustrating.
Excellent! Looking forward to your series to clean that up ;)
Thanks,
Lance
next prev parent reply other threads:[~2025-10-17 16:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-17 9:38 [PATCH mm-new v2 1/1] mm/khugepaged: guard is_zero_pfn() calls with pte_present() Lance Yang
2025-10-17 14:42 ` Nico Pache
2025-10-17 14:51 ` David Hildenbrand
2025-10-17 15:04 ` Lance Yang
2025-10-17 15:44 ` Lorenzo Stoakes
2025-10-17 16:33 ` Lance Yang [this message]
2025-10-20 13:55 ` Lorenzo Stoakes
2025-10-20 14:58 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b5627e83-489c-4e16-910c-fe7e56912793@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=ioworker0@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=richard.weiyang@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.