All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: Dev Jain <dev.jain@arm.com>
Cc: david@redhat.com, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, baohua@kernel.org,
	baolin.wang@linux.alibaba.com, hughd@google.com,
	ioworker0@gmail.com, kirill@shutemov.name,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mpenttil@redhat.com, npache@redhat.com, ryan.roberts@arm.com,
	ziy@nvidia.com, richard.weiyang@gmail.com,
	akpm@linux-foundation.org
Subject: Re: [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries
Date: Wed, 1 Oct 2025 18:48:07 +0800	[thread overview]
Message-ID: <35c9a208-a7e6-41ab-aa3e-dbfdffaf5f44@linux.dev> (raw)
In-Reply-To: <a1df234c-d003-4696-8f1f-609a360fdeda@arm.com>



On 2025/10/1 18:20, Dev Jain wrote:
> 
> On 01/10/25 8:52 am, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> Currently, special non-swap entries (like migration, hwpoison, or PTE
>> markers) are not caught early in hpage_collapse_scan_pmd(), leading to
>> failures deep in the swap-in logic.
>>
>> hpage_collapse_scan_pmd()
>>   `- collapse_huge_page()
>>       `- __collapse_huge_page_swapin() -> fails!
>>
>> As David suggested[1], this patch skips any such non-swap entries
>> early. If any one is found, the scan is aborted immediately with the
>> SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
>> work.
>>
>> [1] https://lore.kernel.org/linux-mm/7840f68e-7580-42cb- 
>> a7c8-1ba64fd6df69@redhat.com
>> [2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680- 
>> dcd55219c8bd@lucifer.local
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> v1 -> v2:
>>   - Skip all non-present entries except swap entries (per David) thanks!
>>   - https://lore.kernel.org/linux-mm/20250924100207.28332-1- 
>> lance.yang@linux.dev/
>>
>>   mm/khugepaged.c | 32 ++++++++++++++++++--------------
>>   1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 7ab2d1a42df3..d0957648db19 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1284,7 +1284,23 @@ static int hpage_collapse_scan_pmd(struct 
>> mm_struct *mm,
>>       for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>            _pte++, addr += PAGE_SIZE) {
>>           pte_t pteval = ptep_get(_pte);
>> -        if (is_swap_pte(pteval)) {
>> +        if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> +            ++none_or_zero;
>> +            if (!userfaultfd_armed(vma) &&
>> +                (!cc->is_khugepaged ||
>> +                 none_or_zero <= khugepaged_max_ptes_none)) {
>> +                continue;
>> +            } else {
>> +                result = SCAN_EXCEED_NONE_PTE;
>> +                count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> +                goto out_unmap;
>> +            }
>> +        } else if (!pte_present(pteval)) {
> 
> If you are trying to merge this with the _isolate() conditions, we can do
> a micro-optimization here - is_swap_pte, (pte_none && is_zero_pfn), and 
> pte_uffd_wp
> are disjoint conditions, so we can use if-else-if-else-if to write them.

Ah, indeed, thanks!

I think it would fit better into the follow-up patch that unifies the
scanning logic, and I'll make sure to include it there ;p

> 
>> +            if (non_swap_entry(pte_to_swp_entry(pteval))) {
>> +                result = SCAN_PTE_NON_PRESENT;
>> +                goto out_unmap;
>> +            }
>> +
>>               ++unmapped;
>>               if (!cc->is_khugepaged ||
>>                   unmapped <= khugepaged_max_ptes_swap) {
>> @@ -1293,7 +1309,7 @@ static int hpage_collapse_scan_pmd(struct 
>> mm_struct *mm,
>>                    * enabled swap entries.  Please see
>>                    * comment below for pte_uffd_wp().
>>                    */
>> -                if (pte_swp_uffd_wp_any(pteval)) {
>> +                if (pte_swp_uffd_wp(pteval)) {
>>                       result = SCAN_PTE_UFFD_WP;
> 
> Could have mentioned in the changelog "while at it, convert 
> pte_swp_uffd_wp_any to
> pte_swp_uffd_wp since we are in the swap pte branch".

Right, that would have been clearer.

I'll add that if a next version is needed :)

> 
>>                       goto out_unmap;
>>                   }
>> @@ -1304,18 +1320,6 @@ static int hpage_collapse_scan_pmd(struct 
>> mm_struct *mm,
>>                   goto out_unmap;
>>               }
>>           }
>> -        if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> -            ++none_or_zero;
>> -            if (!userfaultfd_armed(vma) &&
>> -                (!cc->is_khugepaged ||
>> -                 none_or_zero <= khugepaged_max_ptes_none)) {
>> -                continue;
>> -            } else {
>> -                result = SCAN_EXCEED_NONE_PTE;
>> -                count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> -                goto out_unmap;
>> -            }
>> -        }
>>           if (pte_uffd_wp(pteval)) {
>>               /*
>>                * Don't collapse the page if any of the small
> 
> Otherwise LGTM
> 
> Reviewed-by: Dev Jain <dev.jain@arm.com>

Cheers!



      reply	other threads:[~2025-10-01 10:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-01  3:22 [PATCH mm-new v2 1/1] mm/khugepaged: abort collapse scan on non-swap entries Lance Yang
2025-10-01  8:31 ` David Hildenbrand
2025-10-01  9:38   ` Lance Yang
2025-10-01  8:54 ` Wei Yang
2025-10-01  9:15   ` David Hildenbrand
2025-10-01 10:05   ` Lance Yang
2025-10-01 13:52     ` Wei Yang
2025-10-05  1:05     ` Wei Yang
2025-10-05  2:12       ` Lance Yang
2025-10-06 14:18         ` David Hildenbrand
2025-10-06 15:02           ` Lance Yang
2025-10-07 10:25           ` Lance Yang
2025-10-08  1:37             ` Wei Yang
2025-10-08  9:00             ` David Hildenbrand
2025-10-01 10:20 ` Dev Jain
2025-10-01 10:48   ` Lance Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35c9a208-a7e6-41ab-aa3e-dbfdffaf5f44@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mpenttil@redhat.com \
    --cc=npache@redhat.com \
    --cc=richard.weiyang@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.