Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
@ 2026-06-09 12:04 Lance Yang
  2026-06-09 13:16 ` David Hildenbrand (Arm)
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Lance Yang @ 2026-06-09 12:04 UTC (permalink / raw)
  To: akpm
  Cc: david, ljs, ziy, baolin.wang, liam, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel, Lance Yang

From: Lance Yang <lance.yang@linux.dev>

mthp_collapse() uses mthp_present_ptes to decide whether a range has
enough occupied PTEs to try collapse. Swap PTEs accepted by
collapse_scan_pmd() are counted in unmapped, but are not represented in
mthp_present_ptes.

When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
then checks the PMD-order candidate using the bitmap.

With max_ptes_none set to 0, a range with 511 present PTEs and one swap
PTE no longer reaches collapse_huge_page(), even though PMD collapse can
handle swap PTEs up to max_ptes_swap.

Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
through max_ptes_swap, while lower-order mTHP collapse does not currently
support non-present PTEs. Keep non-present PTEs out of the lower-order
eligibility check.

Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
case. Patch [2] is still only in mm-unstable, so no Fixes: tag.

[1] https://lore.kernel.org/linux-mm/CAA1CXcD7WAiA1b9GTLAuNZ+kHaFx0SzZwpBkqAZ=s+RHsTUaow@mail.gmail.com/
[2] https://lore.kernel.org/linux-mm/20260605161422.213817-12-npache@redhat.com/

 mm/khugepaged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b12187709f6d..617bca76db49 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1508,6 +1508,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
 		nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
 						      offset + nr_ptes);
 
+		/*
+		 * Swap PTEs accepted during the scan are counted in @unmapped,
+		 * not in the present-PTE bitmap. Account them for the PMD-order
+		 * candidate.
+		 */
+		if (is_pmd_order(order))
+			nr_occupied_ptes += unmapped;
+
 		if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
 			enum scan_result ret;
 
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
  2026-06-09 12:04 [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting Lance Yang
@ 2026-06-09 13:16 ` David Hildenbrand (Arm)
  2026-06-09 14:33   ` Lorenzo Stoakes
                     ` (2 more replies)
  2026-06-09 13:20 ` David Hildenbrand (Arm)
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 13:16 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: ljs, ziy, baolin.wang, liam, npache, ryan.roberts, dev.jain,
	baohua, linux-mm, linux-kernel

On 6/9/26 14:04, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> mthp_collapse() uses mthp_present_ptes to decide whether a range has
> enough occupied PTEs to try collapse. Swap PTEs accepted by
> collapse_scan_pmd() are counted in unmapped, but are not represented in
> mthp_present_ptes.
> 
> When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
> so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
> then checks the PMD-order candidate using the bitmap.
> 
> With max_ptes_none set to 0, a range with 511 present PTEs and one swap
> PTE no longer reaches collapse_huge_page(), even though PMD collapse can
> handle swap PTEs up to max_ptes_swap.
> 
> Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
> through max_ptes_swap, while lower-order mTHP collapse does not currently
> support non-present PTEs. Keep non-present PTEs out of the lower-order
> eligibility check.
> 
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
> case. Patch [2] is still only in mm-unstable, so no Fixes: tag.
> 
> [1] https://lore.kernel.org/linux-mm/CAA1CXcD7WAiA1b9GTLAuNZ+kHaFx0SzZwpBkqAZ=s+RHsTUaow@mail.gmail.com/
> [2] https://lore.kernel.org/linux-mm/20260605161422.213817-12-npache@redhat.com/
> 
>  mm/khugepaged.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b12187709f6d..617bca76db49 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1508,6 +1508,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
>  		nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
>  						      offset + nr_ptes);
>  
> +		/*
> +		 * Swap PTEs accepted during the scan are counted in @unmapped,
> +		 * not in the present-PTE bitmap. Account them for the PMD-order
> +		 * candidate.
> +		 */
> +		if (is_pmd_order(order))
> +			nr_occupied_ptes += unmapped;
> +

LGTM, there is a bit of opportunity for cleanup in the future :)

Acked-by: David Hildenbrand (Arm) <david@kernel.org>


For example, as we no longer have the VMA here, collapse_max_ptes_none is
imprecise in uffd VMAs. We might try collapsing where there sure is nothing to
collapse.

We could likely handle the userfaultfd_armed() part easier: some indication that
we must not have any pte_none() would be sufficient.

Also, I don't see a good reason why uffd would not be allowed to collapse with
zeropages ... it's really just about missing faults due to pte_none().

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
  2026-06-09 12:04 [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting Lance Yang
  2026-06-09 13:16 ` David Hildenbrand (Arm)
@ 2026-06-09 13:20 ` David Hildenbrand (Arm)
  2026-06-09 13:56   ` Lance Yang
  2026-06-09 14:32 ` Lorenzo Stoakes
  2026-06-09 17:08 ` Nico Pache
  3 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 13:20 UTC (permalink / raw)
  To: Lance Yang, akpm
  Cc: ljs, ziy, baolin.wang, liam, npache, ryan.roberts, dev.jain,
	baohua, linux-mm, linux-kernel

On 6/9/26 14:04, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> mthp_collapse() uses mthp_present_ptes to decide whether a range has
> enough occupied PTEs to try collapse. Swap PTEs accepted by
> collapse_scan_pmd() are counted in unmapped, but are not represented in
> mthp_present_ptes.
> 
> When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
> so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
> then checks the PMD-order candidate using the bitmap.
> 
> With max_ptes_none set to 0, a range with 511 present PTEs and one swap
> PTE no longer reaches collapse_huge_page(), even though PMD collapse can
> handle swap PTEs up to max_ptes_swap.
> 
> Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
> through max_ptes_swap, while lower-order mTHP collapse does not currently
> support non-present PTEs. Keep non-present PTEs out of the lower-order
> eligibility check.
> 
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
> case. Patch [2] is still only in mm-unstable, so no Fixes: tag.

Right, probably we just want to add the Fixes: tag once Andrew moves the series
to mm-stable?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
  2026-06-09 13:20 ` David Hildenbrand (Arm)
@ 2026-06-09 13:56   ` Lance Yang
  0 siblings, 0 replies; 9+ messages in thread
From: Lance Yang @ 2026-06-09 13:56 UTC (permalink / raw)
  To: David Hildenbrand (Arm), akpm
  Cc: ljs, ziy, baolin.wang, liam, npache, ryan.roberts, dev.jain,
	baohua, linux-mm, linux-kernel



On 2026/6/9 21:20, David Hildenbrand (Arm) wrote:
> On 6/9/26 14:04, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> mthp_collapse() uses mthp_present_ptes to decide whether a range has
>> enough occupied PTEs to try collapse. Swap PTEs accepted by
>> collapse_scan_pmd() are counted in unmapped, but are not represented in
>> mthp_present_ptes.
>>
>> When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
>> so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
>> then checks the PMD-order candidate using the bitmap.
>>
>> With max_ptes_none set to 0, a range with 511 present PTEs and one swap
>> PTE no longer reaches collapse_huge_page(), even though PMD collapse can
>> handle swap PTEs up to max_ptes_swap.
>>
>> Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
>> through max_ptes_swap, while lower-order mTHP collapse does not currently
>> support non-present PTEs. Keep non-present PTEs out of the lower-order
>> eligibility check.
>>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
>> case. Patch [2] is still only in mm-unstable, so no Fixes: tag.
> 
> Right, probably we just want to add the Fixes: tag once Andrew moves the series
> to mm-stable?

Yep, hopefully Andrew can add the Fixes: tag when applying this, once
the series lands in mm-stable. Should be soon, I guess :P


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
  2026-06-09 12:04 [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting Lance Yang
  2026-06-09 13:16 ` David Hildenbrand (Arm)
  2026-06-09 13:20 ` David Hildenbrand (Arm)
@ 2026-06-09 14:32 ` Lorenzo Stoakes
  2026-06-09 17:08 ` Nico Pache
  3 siblings, 0 replies; 9+ messages in thread
From: Lorenzo Stoakes @ 2026-06-09 14:32 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, ziy, baolin.wang, liam, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel

On Tue, Jun 09, 2026 at 08:04:43PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> mthp_collapse() uses mthp_present_ptes to decide whether a range has
> enough occupied PTEs to try collapse. Swap PTEs accepted by
> collapse_scan_pmd() are counted in unmapped, but are not represented in
> mthp_present_ptes.
>
> When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
> so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
> then checks the PMD-order candidate using the bitmap.
>
> With max_ptes_none set to 0, a range with 511 present PTEs and one swap
> PTE no longer reaches collapse_huge_page(), even though PMD collapse can
> handle swap PTEs up to max_ptes_swap.
>
> Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
> through max_ptes_swap, while lower-order mTHP collapse does not currently
> support non-present PTEs. Keep non-present PTEs out of the lower-order
> eligibility check.
>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>

LGTM, so:

Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>

> ---
> Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
> case. Patch [2] is still only in mm-unstable, so no Fixes: tag.
>
> [1] https://lore.kernel.org/linux-mm/CAA1CXcD7WAiA1b9GTLAuNZ+kHaFx0SzZwpBkqAZ=s+RHsTUaow@mail.gmail.com/
> [2] https://lore.kernel.org/linux-mm/20260605161422.213817-12-npache@redhat.com/
>
>  mm/khugepaged.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b12187709f6d..617bca76db49 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1508,6 +1508,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
>  		nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
>  						      offset + nr_ptes);
>
> +		/*
> +		 * Swap PTEs accepted during the scan are counted in @unmapped,
> +		 * not in the present-PTE bitmap. Account them for the PMD-order
> +		 * candidate.
> +		 */
> +		if (is_pmd_order(order))
> +			nr_occupied_ptes += unmapped;
> +
>  		if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
>  			enum scan_result ret;
>
> --
> 2.49.0
>

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
  2026-06-09 13:16 ` David Hildenbrand (Arm)
@ 2026-06-09 14:33   ` Lorenzo Stoakes
  2026-06-09 16:28   ` Lance Yang
  2026-06-09 17:04   ` Nico Pache
  2 siblings, 0 replies; 9+ messages in thread
From: Lorenzo Stoakes @ 2026-06-09 14:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Lance Yang, akpm, ziy, baolin.wang, liam, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel

On Tue, Jun 09, 2026 at 03:16:10PM +0200, David Hildenbrand (Arm) wrote:
> On 6/9/26 14:04, Lance Yang wrote:
> > From: Lance Yang <lance.yang@linux.dev>
> >
> > mthp_collapse() uses mthp_present_ptes to decide whether a range has
> > enough occupied PTEs to try collapse. Swap PTEs accepted by
> > collapse_scan_pmd() are counted in unmapped, but are not represented in
> > mthp_present_ptes.
> >
> > When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
> > so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
> > then checks the PMD-order candidate using the bitmap.
> >
> > With max_ptes_none set to 0, a range with 511 present PTEs and one swap
> > PTE no longer reaches collapse_huge_page(), even though PMD collapse can
> > handle swap PTEs up to max_ptes_swap.
> >
> > Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
> > through max_ptes_swap, while lower-order mTHP collapse does not currently
> > support non-present PTEs. Keep non-present PTEs out of the lower-order
> > eligibility check.
> >
> > Signed-off-by: Lance Yang <lance.yang@linux.dev>
> > ---
> > Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
> > case. Patch [2] is still only in mm-unstable, so no Fixes: tag.
> >
> > [1] https://lore.kernel.org/linux-mm/CAA1CXcD7WAiA1b9GTLAuNZ+kHaFx0SzZwpBkqAZ=s+RHsTUaow@mail.gmail.com/
> > [2] https://lore.kernel.org/linux-mm/20260605161422.213817-12-npache@redhat.com/
> >
> >  mm/khugepaged.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index b12187709f6d..617bca76db49 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1508,6 +1508,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
> >  		nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
> >  						      offset + nr_ptes);
> >
> > +		/*
> > +		 * Swap PTEs accepted during the scan are counted in @unmapped,
> > +		 * not in the present-PTE bitmap. Account them for the PMD-order
> > +		 * candidate.
> > +		 */
> > +		if (is_pmd_order(order))
> > +			nr_occupied_ptes += unmapped;
> > +
>
> LGTM, there is a bit of opportunity for cleanup in the future :)

From my point of view, accepting the mTHP khugepaged changes was essentially a
big compromise on how much it adds to the mess of the existing code base, and
AFAIC we shouldn't accept any further major changes until we actually sort this
mess out :)

>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>
>
> For example, as we no longer have the VMA here, collapse_max_ptes_none is
> imprecise in uffd VMAs. We might try collapsing where there sure is nothing to
> collapse.
>
> We could likely handle the userfaultfd_armed() part easier: some indication that
> we must not have any pte_none() would be sufficient.
>
> Also, I don't see a good reason why uffd would not be allowed to collapse with
> zeropages ... it's really just about missing faults due to pte_none().

Ugh uffd.

>
> --
> Cheers,
>
> David

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
  2026-06-09 13:16 ` David Hildenbrand (Arm)
  2026-06-09 14:33   ` Lorenzo Stoakes
@ 2026-06-09 16:28   ` Lance Yang
  2026-06-09 17:04   ` Nico Pache
  2 siblings, 0 replies; 9+ messages in thread
From: Lance Yang @ 2026-06-09 16:28 UTC (permalink / raw)
  To: David Hildenbrand (Arm), akpm, ljs
  Cc: ziy, baolin.wang, liam, npache, ryan.roberts, dev.jain, baohua,
	linux-mm, linux-kernel



On 2026/6/9 21:16, David Hildenbrand (Arm) wrote:
> On 6/9/26 14:04, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> mthp_collapse() uses mthp_present_ptes to decide whether a range has
>> enough occupied PTEs to try collapse. Swap PTEs accepted by
>> collapse_scan_pmd() are counted in unmapped, but are not represented in
>> mthp_present_ptes.
>>
>> When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
>> so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
>> then checks the PMD-order candidate using the bitmap.
>>
>> With max_ptes_none set to 0, a range with 511 present PTEs and one swap
>> PTE no longer reaches collapse_huge_page(), even though PMD collapse can
>> handle swap PTEs up to max_ptes_swap.
>>
>> Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
>> through max_ptes_swap, while lower-order mTHP collapse does not currently
>> support non-present PTEs. Keep non-present PTEs out of the lower-order
>> eligibility check.
>>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
>> case. Patch [2] is still only in mm-unstable, so no Fixes: tag.
>>
>> [1] https://lore.kernel.org/linux-mm/CAA1CXcD7WAiA1b9GTLAuNZ+kHaFx0SzZwpBkqAZ=s+RHsTUaow@mail.gmail.com/
>> [2] https://lore.kernel.org/linux-mm/20260605161422.213817-12-npache@redhat.com/
>>
>>   mm/khugepaged.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index b12187709f6d..617bca76db49 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1508,6 +1508,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
>>   		nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
>>   						      offset + nr_ptes);
>>   
>> +		/*
>> +		 * Swap PTEs accepted during the scan are counted in @unmapped,
>> +		 * not in the present-PTE bitmap. Account them for the PMD-order
>> +		 * candidate.
>> +		 */
>> +		if (is_pmd_order(order))
>> +			nr_occupied_ptes += unmapped;
>> +
> 
> LGTM, there is a bit of opportunity for cleanup in the future :)

Yes, follow-up cleanup material :)

> Acked-by: David Hildenbrand (Arm) <david@kernel.org>

Thanks!

> For example, as we no longer have the VMA here, collapse_max_ptes_none is
> imprecise in uffd VMAs. We might try collapsing where there sure is nothing to
> collapse.

Oh, good catch. We may end up trying a collapse that cannot really
go anywhere ... One for a follow-up.

> We could likely handle the userfaultfd_armed() part easier: some indication that
> we must not have any pte_none() would be sufficient.

Right. By the time we get to mthp_collapse(), we probably only need
to carry that as a small "no pte_none" constraint for the candidate
range.

> Also, I don't see a good reason why uffd would not be allowed to collapse with
> zeropages ... it's really just about missing faults due to pte_none().

Makes sense to me. I'll take a look when I get a chance. And yeah,
as Lorenzo said, better to clean up the khugepaged mess first
before piling more on top :)

Cheers, Lance


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
  2026-06-09 13:16 ` David Hildenbrand (Arm)
  2026-06-09 14:33   ` Lorenzo Stoakes
  2026-06-09 16:28   ` Lance Yang
@ 2026-06-09 17:04   ` Nico Pache
  2 siblings, 0 replies; 9+ messages in thread
From: Nico Pache @ 2026-06-09 17:04 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Lance Yang, akpm, ljs, ziy, baolin.wang, liam, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel

On Tue, Jun 9, 2026 at 7:16 AM David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> On 6/9/26 14:04, Lance Yang wrote:
> > From: Lance Yang <lance.yang@linux.dev>
> >
> > mthp_collapse() uses mthp_present_ptes to decide whether a range has
> > enough occupied PTEs to try collapse. Swap PTEs accepted by
> > collapse_scan_pmd() are counted in unmapped, but are not represented in
> > mthp_present_ptes.
> >
> > When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
> > so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
> > then checks the PMD-order candidate using the bitmap.
> >
> > With max_ptes_none set to 0, a range with 511 present PTEs and one swap
> > PTE no longer reaches collapse_huge_page(), even though PMD collapse can
> > handle swap PTEs up to max_ptes_swap.
> >
> > Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
> > through max_ptes_swap, while lower-order mTHP collapse does not currently
> > support non-present PTEs. Keep non-present PTEs out of the lower-order
> > eligibility check.
> >
> > Signed-off-by: Lance Yang <lance.yang@linux.dev>
> > ---
> > Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
> > case. Patch [2] is still only in mm-unstable, so no Fixes: tag.
> >
> > [1] https://lore.kernel.org/linux-mm/CAA1CXcD7WAiA1b9GTLAuNZ+kHaFx0SzZwpBkqAZ=s+RHsTUaow@mail.gmail.com/
> > [2] https://lore.kernel.org/linux-mm/20260605161422.213817-12-npache@redhat.com/
> >
> >  mm/khugepaged.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index b12187709f6d..617bca76db49 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1508,6 +1508,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
> >               nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
> >                                                     offset + nr_ptes);
> >
> > +             /*
> > +              * Swap PTEs accepted during the scan are counted in @unmapped,
> > +              * not in the present-PTE bitmap. Account them for the PMD-order
> > +              * candidate.
> > +              */
> > +             if (is_pmd_order(order))
> > +                     nr_occupied_ptes += unmapped;
> > +
>
> LGTM, there is a bit of opportunity for cleanup in the future :)
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>
>
> For example, as we no longer have the VMA here, collapse_max_ptes_none is
> imprecise in uffd VMAs. We might try collapsing where there sure is nothing to
> collapse.
>
> We could likely handle the userfaultfd_armed() part easier: some indication that
> we must not have any pte_none() would be sufficient.
>
> Also, I don't see a good reason why uffd would not be allowed to collapse with
> zeropages ... it's really just about missing faults due to pte_none().

I have some patches exactly for this :) so far 2-3 patches for better
uffd handling

>
> --
> Cheers,
>
> David
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting
  2026-06-09 12:04 [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting Lance Yang
                   ` (2 preceding siblings ...)
  2026-06-09 14:32 ` Lorenzo Stoakes
@ 2026-06-09 17:08 ` Nico Pache
  3 siblings, 0 replies; 9+ messages in thread
From: Nico Pache @ 2026-06-09 17:08 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, ljs, ziy, baolin.wang, liam, ryan.roberts, dev.jain,
	baohua, linux-mm, linux-kernel

On Tue, Jun 9, 2026 at 6:05 AM Lance Yang <lance.yang@linux.dev> wrote:
>
> From: Lance Yang <lance.yang@linux.dev>
>
> mthp_collapse() uses mthp_present_ptes to decide whether a range has
> enough occupied PTEs to try collapse. Swap PTEs accepted by
> collapse_scan_pmd() are counted in unmapped, but are not represented in
> mthp_present_ptes.
>
> When lower orders are enabled, collapse_scan_pmd() relaxes max_ptes_none
> so the scan can cover the whole PMD and build the bitmap. mthp_collapse()
> then checks the PMD-order candidate using the bitmap.
>
> With max_ptes_none set to 0, a range with 511 present PTEs and one swap
> PTE no longer reaches collapse_huge_page(), even though PMD collapse can
> handle swap PTEs up to max_ptes_swap.
>
> Account unmapped PTEs only for PMD order. PMD collapse supports swap PTEs
> through max_ptes_swap, while lower-order mTHP collapse does not currently
> support non-present PTEs. Keep non-present PTEs out of the lower-order
> eligibility check.
>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> Sent separately, as discussed in [1], to spell out the PMD-order swap PTE
> case. Patch [2] is still only in mm-unstable, so no Fixes: tag.

Thanks for sending the fixup :)

Acked-by: Nico Pache <npache@redhat.com>

>
> [1] https://lore.kernel.org/linux-mm/CAA1CXcD7WAiA1b9GTLAuNZ+kHaFx0SzZwpBkqAZ=s+RHsTUaow@mail.gmail.com/
> [2] https://lore.kernel.org/linux-mm/20260605161422.213817-12-npache@redhat.com/
>
>  mm/khugepaged.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b12187709f6d..617bca76db49 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1508,6 +1508,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
>                 nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
>                                                       offset + nr_ptes);
>
> +               /*
> +                * Swap PTEs accepted during the scan are counted in @unmapped,
> +                * not in the present-PTE bitmap. Account them for the PMD-order
> +                * candidate.
> +                */
> +               if (is_pmd_order(order))
> +                       nr_occupied_ptes += unmapped;
> +
>                 if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
>                         enum scan_result ret;
>
> --
> 2.49.0
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-09 17:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 12:04 [PATCH mm-unstable 1/1] mm/khugepaged: fix PMD collapse swap PTE accounting Lance Yang
2026-06-09 13:16 ` David Hildenbrand (Arm)
2026-06-09 14:33   ` Lorenzo Stoakes
2026-06-09 16:28   ` Lance Yang
2026-06-09 17:04   ` Nico Pache
2026-06-09 13:20 ` David Hildenbrand (Arm)
2026-06-09 13:56   ` Lance Yang
2026-06-09 14:32 ` Lorenzo Stoakes
2026-06-09 17:08 ` Nico Pache

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox