* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
@ 2025-12-02 7:53 ` Baolin Wang
2025-12-03 13:40 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2025-12-02 7:53 UTC (permalink / raw)
To: Nico Pache, linux-kernel, linux-trace-kernel, linux-mm, linux-doc
Cc: david, ziy, lorenzo.stoakes, Liam.Howlett, ryan.roberts, dev.jain,
corbet, rostedt, mhiramat, mathieu.desnoyers, akpm, baohua, willy,
peterx, wangkefeng.wang, usamaarif642, sunnanyong, vishal.moola,
thomas.hellstrom, yang, kas, aarcange, raquini, anshuman.khandual,
catalin.marinas, tiwai, will, dave.hansen, jack, cl, jglisse,
surenb, zokeefe, hannes, rientjes, mhocko, rdunlap, hughd,
richard.weiyang, lance.yang, vbabka, rppt, jannh, pfalcato
On 2025/12/2 01:46, Nico Pache wrote:
> The current mechanism for determining mTHP collapse scales the
> khugepaged_max_ptes_none value based on the target order. This
> introduces an undesirable feedback loop, or "creep", when max_ptes_none
> is set to a value greater than HPAGE_PMD_NR / 2.
>
> With this configuration, a successful collapse to order N will populate
> enough pages to satisfy the collapse condition on order N+1 on the next
> scan. This leads to unnecessary work and memory churn.
>
> To fix this issue introduce a helper function that will limit mTHP
> collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
> This effectively supports two modes:
>
> - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
> - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
> available mTHP order.
>
> This removes the possiblilty of "creep", while not modifying any uAPI
> expectations. A warning will be emitted if any non-supported
> max_ptes_none value is configured with mTHP enabled.
>
> The limits can be ignored by passing full_scan=true, this is useful for
> madvise_collapse (which ignores limits), or in the case of
> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> collapse is available.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8dab49c53128..f425238d5d4f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
> wake_up_interruptible(&khugepaged_wait);
> }
>
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> + * @order: The folio order being collapsed to
> + * @full_scan: Whether this is a full scan (ignore limits)
> + *
> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> + *
> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> + * khugepaged_max_ptes_none value.
> + *
> + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
> + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
> + * collapse will be attempted
> + *
> + * Return: Maximum number of empty PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> +{
> + /* ignore max_ptes_none limits */
> + if (full_scan)
> + return HPAGE_PMD_NR - 1;
> +
> + if (!is_mthp_order(order))
> + return khugepaged_max_ptes_none;
> +
> + /* Zero/non-present collapse disabled. */
> + if (!khugepaged_max_ptes_none)
> + return 0;
> +
> + if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> + return (1 << order) - 1;
> +
> + pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
> + HPAGE_PMD_NR - 1);
> + return -EINVAL;
> +}
Thanks. That aligns with what we talked about previously. So
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
2025-12-02 7:53 ` Baolin Wang
@ 2025-12-03 13:40 ` kernel test robot
2025-12-03 21:02 ` Nico Pache
2025-12-16 8:12 ` Baolin Wang
3 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-12-03 13:40 UTC (permalink / raw)
To: Nico Pache; +Cc: llvm, oe-kbuild-all
Hi Nico,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on next-20251203]
[cannot apply to linus/master v6.18]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Nico-Pache/khugepaged-rename-hpage_collapse_-to-collapse_/20251202-015150
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20251201174627.23295-8-npache%40redhat.com
patch subject: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20251203/202512032107.44KoCA71-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251203/202512032107.44KoCA71-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512032107.44KoCA71-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> mm/khugepaged.c:593:6: warning: variable '_pte' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
593 | if (max_ptes_none == -EINVAL)
| ^~~~~~~~~~~~~~~~~~~~~~~~
mm/khugepaged.c:724:25: note: uninitialized use occurs here
724 | release_pte_pages(pte, _pte, compound_pagelist);
| ^~~~
mm/khugepaged.c:593:2: note: remove the 'if' if its condition is always false
593 | if (max_ptes_none == -EINVAL)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
594 | goto out;
| ~~~~~~~~
mm/khugepaged.c:588:13: note: initialize the variable '_pte' to silence this warning
588 | pte_t *_pte;
| ^
| = NULL
1 warning generated.
vim +593 mm/khugepaged.c
580
581 static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
582 unsigned long start_addr, pte_t *pte, struct collapse_control *cc,
583 unsigned int order, struct list_head *compound_pagelist)
584 {
585 struct page *page = NULL;
586 struct folio *folio = NULL;
587 unsigned long addr = start_addr;
588 pte_t *_pte;
589 int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
590 const unsigned long nr_pages = 1UL << order;
591 int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
592
> 593 if (max_ptes_none == -EINVAL)
594 goto out;
595
596 for (_pte = pte; _pte < pte + nr_pages;
597 _pte++, addr += PAGE_SIZE) {
598 pte_t pteval = ptep_get(_pte);
599 if (pte_none_or_zero(pteval)) {
600 ++none_or_zero;
601 if (!userfaultfd_armed(vma) &&
602 (!cc->is_khugepaged ||
603 none_or_zero <= max_ptes_none)) {
604 continue;
605 } else {
606 result = SCAN_EXCEED_NONE_PTE;
607 count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
608 goto out;
609 }
610 }
611 if (!pte_present(pteval)) {
612 result = SCAN_PTE_NON_PRESENT;
613 goto out;
614 }
615 if (pte_uffd_wp(pteval)) {
616 result = SCAN_PTE_UFFD_WP;
617 goto out;
618 }
619 page = vm_normal_page(vma, addr, pteval);
620 if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
621 result = SCAN_PAGE_NULL;
622 goto out;
623 }
624
625 folio = page_folio(page);
626 VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
627
628 /* See collapse_scan_pmd(). */
629 if (folio_maybe_mapped_shared(folio)) {
630 ++shared;
631 /*
632 * TODO: Support shared pages without leading to further
633 * mTHP collapses. Currently bringing in new pages via
634 * shared may cause a future higher order collapse on a
635 * rescan of the same range.
636 */
637 if (is_mthp_order(order) || (cc->is_khugepaged &&
638 shared > khugepaged_max_ptes_shared)) {
639 result = SCAN_EXCEED_SHARED_PTE;
640 count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
641 goto out;
642 }
643 }
644
645 if (folio_test_large(folio)) {
646 struct folio *f;
647
648 /*
649 * Check if we have dealt with the compound page
650 * already
651 */
652 list_for_each_entry(f, compound_pagelist, lru) {
653 if (folio == f)
654 goto next;
655 }
656 }
657
658 /*
659 * We can do it before folio_isolate_lru because the
660 * folio can't be freed from under us. NOTE: PG_lock
661 * is needed to serialize against split_huge_page
662 * when invoked from the VM.
663 */
664 if (!folio_trylock(folio)) {
665 result = SCAN_PAGE_LOCK;
666 goto out;
667 }
668
669 /*
670 * Check if the page has any GUP (or other external) pins.
671 *
672 * The page table that maps the page has been already unlinked
673 * from the page table tree and this process cannot get
674 * an additional pin on the page.
675 *
676 * New pins can come later if the page is shared across fork,
677 * but not from this process. The other process cannot write to
678 * the page, only trigger CoW.
679 */
680 if (folio_expected_ref_count(folio) != folio_ref_count(folio)) {
681 folio_unlock(folio);
682 result = SCAN_PAGE_COUNT;
683 goto out;
684 }
685
686 /*
687 * Isolate the page to avoid collapsing an hugepage
688 * currently in use by the VM.
689 */
690 if (!folio_isolate_lru(folio)) {
691 folio_unlock(folio);
692 result = SCAN_DEL_PAGE_LRU;
693 goto out;
694 }
695 node_stat_mod_folio(folio,
696 NR_ISOLATED_ANON + folio_is_file_lru(folio),
697 folio_nr_pages(folio));
698 VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
699 VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
700
701 if (folio_test_large(folio))
702 list_add_tail(&folio->lru, compound_pagelist);
703 next:
704 /*
705 * If collapse was initiated by khugepaged, check that there is
706 * enough young pte to justify collapsing the page
707 */
708 if (cc->is_khugepaged &&
709 (pte_young(pteval) || folio_test_young(folio) ||
710 folio_test_referenced(folio) ||
711 mmu_notifier_test_young(vma->vm_mm, addr)))
712 referenced++;
713 }
714
715 if (unlikely(cc->is_khugepaged && !referenced)) {
716 result = SCAN_LACK_REFERENCED_PAGE;
717 } else {
718 result = SCAN_SUCCEED;
719 trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
720 referenced, result);
721 return result;
722 }
723 out:
724 release_pte_pages(pte, _pte, compound_pagelist);
725 trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
726 referenced, result);
727 return result;
728 }
729
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
2025-12-02 7:53 ` Baolin Wang
2025-12-03 13:40 ` kernel test robot
@ 2025-12-03 21:02 ` Nico Pache
2025-12-16 8:12 ` Baolin Wang
3 siblings, 0 replies; 8+ messages in thread
From: Nico Pache @ 2025-12-03 21:02 UTC (permalink / raw)
To: akpm, linux-kernel, linux-trace-kernel, linux-mm, linux-doc
Cc: david, ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett,
ryan.roberts, dev.jain, corbet, rostedt, mhiramat,
mathieu.desnoyers, baohua, willy, peterx, wangkefeng.wang,
usamaarif642, sunnanyong, vishal.moola, thomas.hellstrom, yang,
kas, aarcange, raquini, anshuman.khandual, catalin.marinas, tiwai,
will, dave.hansen, jack, cl, jglisse, surenb, zokeefe, hannes,
rientjes, mhocko, rdunlap, hughd, richard.weiyang, lance.yang,
vbabka, rppt, jannh, pfalcato
Hi Andrew,
The bot has reported a potential uninitialized use of a variable.
Can you please squash the following fixup to this commit.
Thank you,
Nico
----8<----
From 846f79d91a25ebad76cbab3690ae315cfe3cf278 Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Wed, 3 Dec 2025 13:42:18 -0700
Subject: [PATCH] khugepaged: fixup unintialized _pte variable
There is a potential use of an uninitialized variable after
`khugepaged: introduce collapse_max_ptes_none helper function`
Andrew can you please append this to Patch 7 of my series
as reported by the kernel test robot
>> mm/khugepaged.c:593:6: warning: variable '_pte' is used uninitialized
whenever 'if' condition is true [-Wsometimes-uninitialized]
593 | if (max_ptes_none == -EINVAL)
| ^~~~~~~~~~~~~~~~~~~~~~~~
mm/khugepaged.c:724:25: note: uninitialized use occurs here
724 | release_pte_pages(pte, _pte, compound_pagelist);
| ^~~~
mm/khugepaged.c:593:2: note: remove the 'if' if its condition is always false
593 | if (max_ptes_none == -EINVAL)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
594 | goto out;
| ~~~~~~~~
mm/khugepaged.c:588:13: note: initialize the variable '_pte' to silence this warning
588 | pte_t *_pte;
| ^
| = NULL
1 warning generated.
Signed-off-by: Nico Pache <npache@redhat.com>
---
mm/khugepaged.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f425238d5d4f..7c7d04d6737e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -585,7 +585,7 @@ static int __collapse_huge_page_isolate(struct
vm_area_struct *vma,
struct page *page = NULL;
struct folio *folio = NULL;
unsigned long addr = start_addr;
- pte_t *_pte;
+ pte_t *_pte = pte;
int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
const unsigned long nr_pages = 1UL << order;
int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
@@ -593,8 +593,7 @@ static int __collapse_huge_page_isolate(struct
vm_area_struct *vma,
if (max_ptes_none == -EINVAL)
goto out;
- for (_pte = pte; _pte < pte + nr_pages;
- _pte++, addr += PAGE_SIZE) {
+ for (; _pte < pte + nr_pages; _pte++, addr += PAGE_SIZE) {
pte_t pteval = ptep_get(_pte);
if (pte_none_or_zero(pteval)) {
++none_or_zero;
--
2.52.0
On 12/1/25 10:46 AM, Nico Pache wrote:
> The current mechanism for determining mTHP collapse scales the
> khugepaged_max_ptes_none value based on the target order. This
> introduces an undesirable feedback loop, or "creep", when max_ptes_none
> is set to a value greater than HPAGE_PMD_NR / 2.
>
> With this configuration, a successful collapse to order N will populate
> enough pages to satisfy the collapse condition on order N+1 on the next
> scan. This leads to unnecessary work and memory churn.
>
> To fix this issue introduce a helper function that will limit mTHP
> collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
> This effectively supports two modes:
>
> - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
> - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
> available mTHP order.
>
> This removes the possiblilty of "creep", while not modifying any uAPI
> expectations. A warning will be emitted if any non-supported
> max_ptes_none value is configured with mTHP enabled.
>
> The limits can be ignored by passing full_scan=true, this is useful for
> madvise_collapse (which ignores limits), or in the case of
> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> collapse is available.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8dab49c53128..f425238d5d4f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
> wake_up_interruptible(&khugepaged_wait);
> }
>
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> + * @order: The folio order being collapsed to
> + * @full_scan: Whether this is a full scan (ignore limits)
> + *
> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> + *
> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> + * khugepaged_max_ptes_none value.
> + *
> + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
> + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
> + * collapse will be attempted
> + *
> + * Return: Maximum number of empty PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> +{
> + /* ignore max_ptes_none limits */
> + if (full_scan)
> + return HPAGE_PMD_NR - 1;
> +
> + if (!is_mthp_order(order))
> + return khugepaged_max_ptes_none;
> +
> + /* Zero/non-present collapse disabled. */
> + if (!khugepaged_max_ptes_none)
> + return 0;
> +
> + if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> + return (1 << order) - 1;
> +
> + pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
> + HPAGE_PMD_NR - 1);
> + return -EINVAL;
> +}
> +
> void khugepaged_enter_vma(struct vm_area_struct *vma,
> vm_flags_t vm_flags)
> {
> @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> pte_t *_pte;
> int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> const unsigned long nr_pages = 1UL << order;
> - int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> + int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
> +
> + if (max_ptes_none == -EINVAL)
> + goto out;
>
> for (_pte = pte; _pte < pte + nr_pages;
> _pte++, addr += PAGE_SIZE) {
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
` (2 preceding siblings ...)
2025-12-03 21:02 ` Nico Pache
@ 2025-12-16 8:12 ` Baolin Wang
2025-12-16 23:26 ` Nico Pache
3 siblings, 1 reply; 8+ messages in thread
From: Baolin Wang @ 2025-12-16 8:12 UTC (permalink / raw)
To: Nico Pache, linux-kernel, linux-trace-kernel, linux-mm, linux-doc
Cc: david, ziy, lorenzo.stoakes, Liam.Howlett, ryan.roberts, dev.jain,
corbet, rostedt, mhiramat, mathieu.desnoyers, akpm, baohua, willy,
peterx, wangkefeng.wang, usamaarif642, sunnanyong, vishal.moola,
thomas.hellstrom, yang, kas, aarcange, raquini, anshuman.khandual,
catalin.marinas, tiwai, will, dave.hansen, jack, cl, jglisse,
surenb, zokeefe, hannes, rientjes, mhocko, rdunlap, hughd,
richard.weiyang, lance.yang, vbabka, rppt, jannh, pfalcato
Hi Nico,
On 2025/12/2 01:46, Nico Pache wrote:
> The current mechanism for determining mTHP collapse scales the
> khugepaged_max_ptes_none value based on the target order. This
> introduces an undesirable feedback loop, or "creep", when max_ptes_none
> is set to a value greater than HPAGE_PMD_NR / 2.
>
> With this configuration, a successful collapse to order N will populate
> enough pages to satisfy the collapse condition on order N+1 on the next
> scan. This leads to unnecessary work and memory churn.
>
> To fix this issue introduce a helper function that will limit mTHP
> collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
> This effectively supports two modes:
>
> - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
> - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
> available mTHP order.
>
> This removes the possiblilty of "creep", while not modifying any uAPI
> expectations. A warning will be emitted if any non-supported
> max_ptes_none value is configured with mTHP enabled.
>
> The limits can be ignored by passing full_scan=true, this is useful for
> madvise_collapse (which ignores limits), or in the case of
> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> collapse is available.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8dab49c53128..f425238d5d4f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
> wake_up_interruptible(&khugepaged_wait);
> }
>
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> + * @order: The folio order being collapsed to
> + * @full_scan: Whether this is a full scan (ignore limits)
> + *
> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> + *
> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> + * khugepaged_max_ptes_none value.
> + *
> + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
> + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
> + * collapse will be attempted
> + *
> + * Return: Maximum number of empty PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> +{
> + /* ignore max_ptes_none limits */
> + if (full_scan)
> + return HPAGE_PMD_NR - 1;
> +
> + if (!is_mthp_order(order))
> + return khugepaged_max_ptes_none;
> +
> + /* Zero/non-present collapse disabled. */
> + if (!khugepaged_max_ptes_none)
> + return 0;
> +
> + if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> + return (1 << order) - 1;
> +
> + pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
> + HPAGE_PMD_NR - 1);
> + return -EINVAL;
> +}
> +
> void khugepaged_enter_vma(struct vm_area_struct *vma,
> vm_flags_t vm_flags)
> {
> @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> pte_t *_pte;
> int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> const unsigned long nr_pages = 1UL << order;
> - int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> + int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
> +
> + if (max_ptes_none == -EINVAL)
> + goto out;
After testing your patchset, I hit the following crash. The reason is
that when 'max_ptes_none' is -EINVAL here, it shouldn't goto out to call
release_pte_pages(), because the '_pte' hasn't been initialized at this
point, and there's no need to release folios either.
After applying the fix below, the crash issue is resolved. I'm not sure
whether Andrew will help fix this or if you will send a new version to
address this issue.
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8cffaf59ced8..2e8171a6d7df 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -646,7 +646,7 @@ static int __collapse_huge_page_isolate(struct
vm_area_struct *vma,
int max_ptes_none = collapse_max_ptes_none(order,
!cc->is_khugepaged);
if (max_ptes_none == -EINVAL)
- goto out;
+ return result;
for (_pte = pte; _pte < pte + nr_pages;
_pte++, addr += PAGE_SIZE) {
"
[ 565.319345] Unable to handle kernel paging request at virtual address
fffffffffffffffa
.......
[ 565.319409] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001f8549a000
[ 565.319416] [fffffffffffffffa] pgd=0000001f85f2a403,
p4d=0000001f85f2a403, pud=0000001f85f2b403, pmd=0000000000000000
[ 565.319427] Internal error: Oops: 0000000096000006 [#1] SMP
.......
[ 565.326733] pc : release_pte_pages+0x68/0x178
[ 565.326960] lr : __collapse_huge_page_isolate+0xc0/0x748
[ 565.327232] sp : ffff800083593910
.......
[ 565.331476] Call trace:
[ 565.331664] release_pte_pages+0x68/0x178 (P)
[ 565.331940] __collapse_huge_page_isolate+0xc0/0x748
[ 565.332249] collapse_huge_page+0x4cc/0xa70
[ 565.332510] mthp_collapse+0x254/0x2a8
[ 565.332754] collapse_scan_pmd+0x5a0/0x6d8
[ 565.333010] collapse_single_pmd+0x214/0x288
[ 565.333275] collapse_scan_mm_slot.constprop.0+0x2ac/0x460
[ 565.333617] khugepaged+0x204/0x2c8
[ 565.333992] kthread+0xf8/0x110
[ 565.334368] ret_from_fork+0x10/0x20
"
>
> for (_pte = pte; _pte < pte + nr_pages;
> _pte++, addr += PAGE_SIZE) {
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
2025-12-16 8:12 ` Baolin Wang
@ 2025-12-16 23:26 ` Nico Pache
2025-12-17 1:33 ` Baolin Wang
0 siblings, 1 reply; 8+ messages in thread
From: Nico Pache @ 2025-12-16 23:26 UTC (permalink / raw)
To: Baolin Wang
Cc: linux-kernel, linux-trace-kernel, linux-mm, linux-doc, david, ziy,
lorenzo.stoakes, Liam.Howlett, ryan.roberts, dev.jain, corbet,
rostedt, mhiramat, mathieu.desnoyers, akpm, baohua, willy, peterx,
wangkefeng.wang, usamaarif642, sunnanyong, vishal.moola,
thomas.hellstrom, yang, kas, aarcange, raquini, anshuman.khandual,
catalin.marinas, tiwai, will, dave.hansen, jack, cl, jglisse,
surenb, zokeefe, hannes, rientjes, mhocko, rdunlap, hughd,
richard.weiyang, lance.yang, vbabka, rppt, jannh, pfalcato
On Tue, Dec 16, 2025 at 1:12 AM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> Hi Nico,
Hi Baolin! Thanks for testing :)
Did you happen to test with the changes I asked Andrew to append to
this commit?
Either way, I think your fixup makes more sense than mine.
Cheers,
-- Nico
>
> On 2025/12/2 01:46, Nico Pache wrote:
> > The current mechanism for determining mTHP collapse scales the
> > khugepaged_max_ptes_none value based on the target order. This
> > introduces an undesirable feedback loop, or "creep", when max_ptes_none
> > is set to a value greater than HPAGE_PMD_NR / 2.
> >
> > With this configuration, a successful collapse to order N will populate
> > enough pages to satisfy the collapse condition on order N+1 on the next
> > scan. This leads to unnecessary work and memory churn.
> >
> > To fix this issue introduce a helper function that will limit mTHP
> > collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
> > This effectively supports two modes:
> >
> > - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
> > - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
> > available mTHP order.
> >
> > This removes the possiblilty of "creep", while not modifying any uAPI
> > expectations. A warning will be emitted if any non-supported
> > max_ptes_none value is configured with mTHP enabled.
> >
> > The limits can be ignored by passing full_scan=true, this is useful for
> > madvise_collapse (which ignores limits), or in the case of
> > collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> > collapse is available.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> > mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 42 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 8dab49c53128..f425238d5d4f 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
> > wake_up_interruptible(&khugepaged_wait);
> > }
> >
> > +/**
> > + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> > + * @order: The folio order being collapsed to
> > + * @full_scan: Whether this is a full scan (ignore limits)
> > + *
> > + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> > + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> > + *
> > + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> > + * khugepaged_max_ptes_none value.
> > + *
> > + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
> > + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
> > + * collapse will be attempted
> > + *
> > + * Return: Maximum number of empty PTEs allowed for the collapse operation
> > + */
> > +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> > +{
> > + /* ignore max_ptes_none limits */
> > + if (full_scan)
> > + return HPAGE_PMD_NR - 1;
> > +
> > + if (!is_mthp_order(order))
> > + return khugepaged_max_ptes_none;
> > +
> > + /* Zero/non-present collapse disabled. */
> > + if (!khugepaged_max_ptes_none)
> > + return 0;
> > +
> > + if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> > + return (1 << order) - 1;
> > +
> > + pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
> > + HPAGE_PMD_NR - 1);
> > + return -EINVAL;
> > +}
> > +
> > void khugepaged_enter_vma(struct vm_area_struct *vma,
> > vm_flags_t vm_flags)
> > {
> > @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> > pte_t *_pte;
> > int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> > const unsigned long nr_pages = 1UL << order;
> > - int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> > + int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
> > +
> > + if (max_ptes_none == -EINVAL)
> > + goto out;
>
> After testing your patchset, I hit the following crash. The reason is
> that when 'max_ptes_none' is -EINVAL here, it shouldn't goto out to call
> release_pte_pages(), because the '_pte' hasn't been initialized at this
> point, and there's no need to release folios either.
>
> After applying the fix below, the crash issue is resolved. I'm not sure
> whether Andrew will help fix this or if you will send a new version to
> address this issue.
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8cffaf59ced8..2e8171a6d7df 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -646,7 +646,7 @@ static int __collapse_huge_page_isolate(struct
> vm_area_struct *vma,
> int max_ptes_none = collapse_max_ptes_none(order,
> !cc->is_khugepaged);
>
> if (max_ptes_none == -EINVAL)
> - goto out;
> + return result;
>
> for (_pte = pte; _pte < pte + nr_pages;
> _pte++, addr += PAGE_SIZE) {
>
> "
> [ 565.319345] Unable to handle kernel paging request at virtual address
> fffffffffffffffa
> .......
> [ 565.319409] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001f8549a000
> [ 565.319416] [fffffffffffffffa] pgd=0000001f85f2a403,
> p4d=0000001f85f2a403, pud=0000001f85f2b403, pmd=0000000000000000
> [ 565.319427] Internal error: Oops: 0000000096000006 [#1] SMP
> .......
> [ 565.326733] pc : release_pte_pages+0x68/0x178
> [ 565.326960] lr : __collapse_huge_page_isolate+0xc0/0x748
> [ 565.327232] sp : ffff800083593910
> .......
> [ 565.331476] Call trace:
> [ 565.331664] release_pte_pages+0x68/0x178 (P)
> [ 565.331940] __collapse_huge_page_isolate+0xc0/0x748
> [ 565.332249] collapse_huge_page+0x4cc/0xa70
> [ 565.332510] mthp_collapse+0x254/0x2a8
> [ 565.332754] collapse_scan_pmd+0x5a0/0x6d8
> [ 565.333010] collapse_single_pmd+0x214/0x288
> [ 565.333275] collapse_scan_mm_slot.constprop.0+0x2ac/0x460
> [ 565.333617] khugepaged+0x204/0x2c8
> [ 565.333992] kthread+0xf8/0x110
> [ 565.334368] ret_from_fork+0x10/0x20
> "
>
> >
> > for (_pte = pte; _pte < pte + nr_pages;
> > _pte++, addr += PAGE_SIZE) {
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
2025-12-16 23:26 ` Nico Pache
@ 2025-12-17 1:33 ` Baolin Wang
0 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2025-12-17 1:33 UTC (permalink / raw)
To: Nico Pache
Cc: linux-kernel, linux-trace-kernel, linux-mm, linux-doc, david, ziy,
lorenzo.stoakes, Liam.Howlett, ryan.roberts, dev.jain, corbet,
rostedt, mhiramat, mathieu.desnoyers, akpm, baohua, willy, peterx,
wangkefeng.wang, usamaarif642, sunnanyong, vishal.moola,
thomas.hellstrom, yang, kas, aarcange, raquini, anshuman.khandual,
catalin.marinas, tiwai, will, dave.hansen, jack, cl, jglisse,
surenb, zokeefe, hannes, rientjes, mhocko, rdunlap, hughd,
richard.weiyang, lance.yang, vbabka, rppt, jannh, pfalcato
On 2025/12/17 07:26, Nico Pache wrote:
> On Tue, Dec 16, 2025 at 1:12 AM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>> Hi Nico,
>
> Hi Baolin! Thanks for testing :)
>
> Did you happen to test with the changes I asked Andrew to append to
> this commit?
>
> Either way, I think your fixup makes more sense than mine.
Ah, I did not notice your fixup commit earlier, which seems to address
this problem. And as I said, we don't need to call release_pte_pages()
in this case.
>> On 2025/12/2 01:46, Nico Pache wrote:
>>> The current mechanism for determining mTHP collapse scales the
>>> khugepaged_max_ptes_none value based on the target order. This
>>> introduces an undesirable feedback loop, or "creep", when max_ptes_none
>>> is set to a value greater than HPAGE_PMD_NR / 2.
>>>
>>> With this configuration, a successful collapse to order N will populate
>>> enough pages to satisfy the collapse condition on order N+1 on the next
>>> scan. This leads to unnecessary work and memory churn.
>>>
>>> To fix this issue introduce a helper function that will limit mTHP
>>> collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
>>> This effectively supports two modes:
>>>
>>> - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
>>> - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
>>> available mTHP order.
>>>
>>> This removes the possiblilty of "creep", while not modifying any uAPI
>>> expectations. A warning will be emitted if any non-supported
>>> max_ptes_none value is configured with mTHP enabled.
>>>
>>> The limits can be ignored by passing full_scan=true, this is useful for
>>> madvise_collapse (which ignores limits), or in the case of
>>> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
>>> collapse is available.
>>>
>>> Signed-off-by: Nico Pache <npache@redhat.com>
>>> ---
>>> mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 42 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 8dab49c53128..f425238d5d4f 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
>>> wake_up_interruptible(&khugepaged_wait);
>>> }
>>>
>>> +/**
>>> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
>>> + * @order: The folio order being collapsed to
>>> + * @full_scan: Whether this is a full scan (ignore limits)
>>> + *
>>> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
>>> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
>>> + *
>>> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
>>> + * khugepaged_max_ptes_none value.
>>> + *
>>> + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
>>> + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
>>> + * collapse will be attempted
>>> + *
>>> + * Return: Maximum number of empty PTEs allowed for the collapse operation
>>> + */
>>> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
>>> +{
>>> + /* ignore max_ptes_none limits */
>>> + if (full_scan)
>>> + return HPAGE_PMD_NR - 1;
>>> +
>>> + if (!is_mthp_order(order))
>>> + return khugepaged_max_ptes_none;
>>> +
>>> + /* Zero/non-present collapse disabled. */
>>> + if (!khugepaged_max_ptes_none)
>>> + return 0;
>>> +
>>> + if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
>>> + return (1 << order) - 1;
>>> +
>>> + pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
>>> + HPAGE_PMD_NR - 1);
>>> + return -EINVAL;
>>> +}
>>> +
>>> void khugepaged_enter_vma(struct vm_area_struct *vma,
>>> vm_flags_t vm_flags)
>>> {
>>> @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>>> pte_t *_pte;
>>> int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
>>> const unsigned long nr_pages = 1UL << order;
>>> - int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
>>> + int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
>>> +
>>> + if (max_ptes_none == -EINVAL)
>>> + goto out;
>>
>> After testing your patchset, I hit the following crash. The reason is
>> that when 'max_ptes_none' is -EINVAL here, it shouldn't goto out to call
>> release_pte_pages(), because the '_pte' hasn't been initialized at this
>> point, and there's no need to release folios either.
>>
>> After applying the fix below, the crash issue is resolved. I'm not sure
>> whether Andrew will help fix this or if you will send a new version to
>> address this issue.
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 8cffaf59ced8..2e8171a6d7df 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -646,7 +646,7 @@ static int __collapse_huge_page_isolate(struct
>> vm_area_struct *vma,
>> int max_ptes_none = collapse_max_ptes_none(order,
>> !cc->is_khugepaged);
>>
>> if (max_ptes_none == -EINVAL)
>> - goto out;
>> + return result;
>>
>> for (_pte = pte; _pte < pte + nr_pages;
>> _pte++, addr += PAGE_SIZE) {
>>
>> "
>> [ 565.319345] Unable to handle kernel paging request at virtual address
>> fffffffffffffffa
>> .......
>> [ 565.319409] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001f8549a000
>> [ 565.319416] [fffffffffffffffa] pgd=0000001f85f2a403,
>> p4d=0000001f85f2a403, pud=0000001f85f2b403, pmd=0000000000000000
>> [ 565.319427] Internal error: Oops: 0000000096000006 [#1] SMP
>> .......
>> [ 565.326733] pc : release_pte_pages+0x68/0x178
>> [ 565.326960] lr : __collapse_huge_page_isolate+0xc0/0x748
>> [ 565.327232] sp : ffff800083593910
>> .......
>> [ 565.331476] Call trace:
>> [ 565.331664] release_pte_pages+0x68/0x178 (P)
>> [ 565.331940] __collapse_huge_page_isolate+0xc0/0x748
>> [ 565.332249] collapse_huge_page+0x4cc/0xa70
>> [ 565.332510] mthp_collapse+0x254/0x2a8
>> [ 565.332754] collapse_scan_pmd+0x5a0/0x6d8
>> [ 565.333010] collapse_single_pmd+0x214/0x288
>> [ 565.333275] collapse_scan_mm_slot.constprop.0+0x2ac/0x460
>> [ 565.333617] khugepaged+0x204/0x2c8
>> [ 565.333992] kthread+0xf8/0x110
>> [ 565.334368] ret_from_fork+0x10/0x20
>> "
>>
>>>
>>> for (_pte = pte; _pte < pte + nr_pages;
>>> _pte++, addr += PAGE_SIZE) {
>>
^ permalink raw reply [flat|nested] 8+ messages in thread