All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
  2025-12-01 17:46 [PATCH v13 mm-new 00/16] khugepaged: mTHP support Nico Pache
@ 2025-12-01 17:46 ` Nico Pache
  2025-12-02  7:53   ` Baolin Wang
                     ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Nico Pache @ 2025-12-01 17:46 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, linux-mm, linux-doc
  Cc: david, ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett,
	ryan.roberts, dev.jain, corbet, rostedt, mhiramat,
	mathieu.desnoyers, akpm, baohua, willy, peterx, wangkefeng.wang,
	usamaarif642, sunnanyong, vishal.moola, thomas.hellstrom, yang,
	kas, aarcange, raquini, anshuman.khandual, catalin.marinas, tiwai,
	will, dave.hansen, jack, cl, jglisse, surenb, zokeefe, hannes,
	rientjes, mhocko, rdunlap, hughd, richard.weiyang, lance.yang,
	vbabka, rppt, jannh, pfalcato

The current mechanism for determining mTHP collapse scales the
khugepaged_max_ptes_none value based on the target order. This
introduces an undesirable feedback loop, or "creep", when max_ptes_none
is set to a value greater than HPAGE_PMD_NR / 2.

With this configuration, a successful collapse to order N will populate
enough pages to satisfy the collapse condition on order N+1 on the next
scan. This leads to unnecessary work and memory churn.

To fix this issue introduce a helper function that will limit mTHP
collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
This effectively supports two modes:

- max_ptes_none=0: never introduce new none-pages for mTHP collapse.
- max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
  available mTHP order.

This removes the possiblilty of "creep", while not modifying any uAPI
expectations. A warning will be emitted if any non-supported
max_ptes_none value is configured with mTHP enabled.

The limits can be ignored by passing full_scan=true, this is useful for
madvise_collapse (which ignores limits), or in the case of
collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
collapse is available.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8dab49c53128..f425238d5d4f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
 		wake_up_interruptible(&khugepaged_wait);
 }
 
+/**
+ * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
+ * @order: The folio order being collapsed to
+ * @full_scan: Whether this is a full scan (ignore limits)
+ *
+ * For madvise-triggered collapses (full_scan=true), all limits are bypassed
+ * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
+ *
+ * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
+ * khugepaged_max_ptes_none value.
+ *
+ * For mTHP collapses, we currently only support khugepaged_max_pte_none values
+ * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
+ * collapse will be attempted
+ *
+ * Return: Maximum number of empty PTEs allowed for the collapse operation
+ */
+static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
+{
+	/* ignore max_ptes_none limits */
+	if (full_scan)
+		return HPAGE_PMD_NR - 1;
+
+	if (!is_mthp_order(order))
+		return khugepaged_max_ptes_none;
+
+	/* Zero/non-present collapse disabled. */
+	if (!khugepaged_max_ptes_none)
+		return 0;
+
+	if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
+		return (1 << order) - 1;
+
+	pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
+		      HPAGE_PMD_NR - 1);
+	return -EINVAL;
+}
+
 void khugepaged_enter_vma(struct vm_area_struct *vma,
 			  vm_flags_t vm_flags)
 {
@@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 	pte_t *_pte;
 	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
 	const unsigned long nr_pages = 1UL << order;
-	int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
+	int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
+
+	if (max_ptes_none == -EINVAL)
+		goto out;
 
 	for (_pte = pte; _pte < pte + nr_pages;
 	     _pte++, addr += PAGE_SIZE) {
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
  2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
@ 2025-12-02  7:53   ` Baolin Wang
  2025-12-03 13:40   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2025-12-02  7:53 UTC (permalink / raw)
  To: Nico Pache, linux-kernel, linux-trace-kernel, linux-mm, linux-doc
  Cc: david, ziy, lorenzo.stoakes, Liam.Howlett, ryan.roberts, dev.jain,
	corbet, rostedt, mhiramat, mathieu.desnoyers, akpm, baohua, willy,
	peterx, wangkefeng.wang, usamaarif642, sunnanyong, vishal.moola,
	thomas.hellstrom, yang, kas, aarcange, raquini, anshuman.khandual,
	catalin.marinas, tiwai, will, dave.hansen, jack, cl, jglisse,
	surenb, zokeefe, hannes, rientjes, mhocko, rdunlap, hughd,
	richard.weiyang, lance.yang, vbabka, rppt, jannh, pfalcato



On 2025/12/2 01:46, Nico Pache wrote:
> The current mechanism for determining mTHP collapse scales the
> khugepaged_max_ptes_none value based on the target order. This
> introduces an undesirable feedback loop, or "creep", when max_ptes_none
> is set to a value greater than HPAGE_PMD_NR / 2.
> 
> With this configuration, a successful collapse to order N will populate
> enough pages to satisfy the collapse condition on order N+1 on the next
> scan. This leads to unnecessary work and memory churn.
> 
> To fix this issue introduce a helper function that will limit mTHP
> collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
> This effectively supports two modes:
> 
> - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
> - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
>    available mTHP order.
> 
> This removes the possiblilty of "creep", while not modifying any uAPI
> expectations. A warning will be emitted if any non-supported
> max_ptes_none value is configured with mTHP enabled.
> 
> The limits can be ignored by passing full_scan=true, this is useful for
> madvise_collapse (which ignores limits), or in the case of
> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> collapse is available.
> 
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>   mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8dab49c53128..f425238d5d4f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
>   		wake_up_interruptible(&khugepaged_wait);
>   }
>   
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> + * @order: The folio order being collapsed to
> + * @full_scan: Whether this is a full scan (ignore limits)
> + *
> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> + *
> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> + * khugepaged_max_ptes_none value.
> + *
> + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
> + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
> + * collapse will be attempted
> + *
> + * Return: Maximum number of empty PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> +{
> +	/* ignore max_ptes_none limits */
> +	if (full_scan)
> +		return HPAGE_PMD_NR - 1;
> +
> +	if (!is_mthp_order(order))
> +		return khugepaged_max_ptes_none;
> +
> +	/* Zero/non-present collapse disabled. */
> +	if (!khugepaged_max_ptes_none)
> +		return 0;
> +
> +	if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> +		return (1 << order) - 1;
> +
> +	pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
> +		      HPAGE_PMD_NR - 1);
> +	return -EINVAL;
> +}

Thanks. That aligns with what we talked about previously. So
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
  2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
  2025-12-02  7:53   ` Baolin Wang
@ 2025-12-03 13:40   ` kernel test robot
  2025-12-03 21:02   ` Nico Pache
  2025-12-16  8:12   ` Baolin Wang
  3 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-12-03 13:40 UTC (permalink / raw)
  To: Nico Pache; +Cc: llvm, oe-kbuild-all

Hi Nico,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on next-20251203]
[cannot apply to linus/master v6.18]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Nico-Pache/khugepaged-rename-hpage_collapse_-to-collapse_/20251202-015150
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20251201174627.23295-8-npache%40redhat.com
patch subject: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20251203/202512032107.44KoCA71-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251203/202512032107.44KoCA71-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512032107.44KoCA71-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/khugepaged.c:593:6: warning: variable '_pte' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
     593 |         if (max_ptes_none == -EINVAL)
         |             ^~~~~~~~~~~~~~~~~~~~~~~~
   mm/khugepaged.c:724:25: note: uninitialized use occurs here
     724 |         release_pte_pages(pte, _pte, compound_pagelist);
         |                                ^~~~
   mm/khugepaged.c:593:2: note: remove the 'if' if its condition is always false
     593 |         if (max_ptes_none == -EINVAL)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     594 |                 goto out;
         |                 ~~~~~~~~
   mm/khugepaged.c:588:13: note: initialize the variable '_pte' to silence this warning
     588 |         pte_t *_pte;
         |                    ^
         |                     = NULL
   1 warning generated.


vim +593 mm/khugepaged.c

   580	
   581	static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
   582			unsigned long start_addr, pte_t *pte, struct collapse_control *cc,
   583			unsigned int order, struct list_head *compound_pagelist)
   584	{
   585		struct page *page = NULL;
   586		struct folio *folio = NULL;
   587		unsigned long addr = start_addr;
   588		pte_t *_pte;
   589		int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
   590		const unsigned long nr_pages = 1UL << order;
   591		int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
   592	
 > 593		if (max_ptes_none == -EINVAL)
   594			goto out;
   595	
   596		for (_pte = pte; _pte < pte + nr_pages;
   597		     _pte++, addr += PAGE_SIZE) {
   598			pte_t pteval = ptep_get(_pte);
   599			if (pte_none_or_zero(pteval)) {
   600				++none_or_zero;
   601				if (!userfaultfd_armed(vma) &&
   602				    (!cc->is_khugepaged ||
   603				     none_or_zero <= max_ptes_none)) {
   604					continue;
   605				} else {
   606					result = SCAN_EXCEED_NONE_PTE;
   607					count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
   608					goto out;
   609				}
   610			}
   611			if (!pte_present(pteval)) {
   612				result = SCAN_PTE_NON_PRESENT;
   613				goto out;
   614			}
   615			if (pte_uffd_wp(pteval)) {
   616				result = SCAN_PTE_UFFD_WP;
   617				goto out;
   618			}
   619			page = vm_normal_page(vma, addr, pteval);
   620			if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
   621				result = SCAN_PAGE_NULL;
   622				goto out;
   623			}
   624	
   625			folio = page_folio(page);
   626			VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
   627	
   628			/* See collapse_scan_pmd(). */
   629			if (folio_maybe_mapped_shared(folio)) {
   630				++shared;
   631				/*
   632				 * TODO: Support shared pages without leading to further
   633				 * mTHP collapses. Currently bringing in new pages via
   634				 * shared may cause a future higher order collapse on a
   635				 * rescan of the same range.
   636				 */
   637				if (is_mthp_order(order) || (cc->is_khugepaged &&
   638				    shared > khugepaged_max_ptes_shared)) {
   639					result = SCAN_EXCEED_SHARED_PTE;
   640					count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
   641					goto out;
   642				}
   643			}
   644	
   645			if (folio_test_large(folio)) {
   646				struct folio *f;
   647	
   648				/*
   649				 * Check if we have dealt with the compound page
   650				 * already
   651				 */
   652				list_for_each_entry(f, compound_pagelist, lru) {
   653					if (folio == f)
   654						goto next;
   655				}
   656			}
   657	
   658			/*
   659			 * We can do it before folio_isolate_lru because the
   660			 * folio can't be freed from under us. NOTE: PG_lock
   661			 * is needed to serialize against split_huge_page
   662			 * when invoked from the VM.
   663			 */
   664			if (!folio_trylock(folio)) {
   665				result = SCAN_PAGE_LOCK;
   666				goto out;
   667			}
   668	
   669			/*
   670			 * Check if the page has any GUP (or other external) pins.
   671			 *
   672			 * The page table that maps the page has been already unlinked
   673			 * from the page table tree and this process cannot get
   674			 * an additional pin on the page.
   675			 *
   676			 * New pins can come later if the page is shared across fork,
   677			 * but not from this process. The other process cannot write to
   678			 * the page, only trigger CoW.
   679			 */
   680			if (folio_expected_ref_count(folio) != folio_ref_count(folio)) {
   681				folio_unlock(folio);
   682				result = SCAN_PAGE_COUNT;
   683				goto out;
   684			}
   685	
   686			/*
   687			 * Isolate the page to avoid collapsing an hugepage
   688			 * currently in use by the VM.
   689			 */
   690			if (!folio_isolate_lru(folio)) {
   691				folio_unlock(folio);
   692				result = SCAN_DEL_PAGE_LRU;
   693				goto out;
   694			}
   695			node_stat_mod_folio(folio,
   696					NR_ISOLATED_ANON + folio_is_file_lru(folio),
   697					folio_nr_pages(folio));
   698			VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
   699			VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
   700	
   701			if (folio_test_large(folio))
   702				list_add_tail(&folio->lru, compound_pagelist);
   703	next:
   704			/*
   705			 * If collapse was initiated by khugepaged, check that there is
   706			 * enough young pte to justify collapsing the page
   707			 */
   708			if (cc->is_khugepaged &&
   709			    (pte_young(pteval) || folio_test_young(folio) ||
   710			     folio_test_referenced(folio) ||
   711			     mmu_notifier_test_young(vma->vm_mm, addr)))
   712				referenced++;
   713		}
   714	
   715		if (unlikely(cc->is_khugepaged && !referenced)) {
   716			result = SCAN_LACK_REFERENCED_PAGE;
   717		} else {
   718			result = SCAN_SUCCEED;
   719			trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
   720							    referenced, result);
   721			return result;
   722		}
   723	out:
   724		release_pte_pages(pte, _pte, compound_pagelist);
   725		trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
   726						    referenced, result);
   727		return result;
   728	}
   729	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
  2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
  2025-12-02  7:53   ` Baolin Wang
  2025-12-03 13:40   ` kernel test robot
@ 2025-12-03 21:02   ` Nico Pache
  2025-12-16  8:12   ` Baolin Wang
  3 siblings, 0 replies; 8+ messages in thread
From: Nico Pache @ 2025-12-03 21:02 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-trace-kernel, linux-mm, linux-doc
  Cc: david, ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett,
	ryan.roberts, dev.jain, corbet, rostedt, mhiramat,
	mathieu.desnoyers, baohua, willy, peterx, wangkefeng.wang,
	usamaarif642, sunnanyong, vishal.moola, thomas.hellstrom, yang,
	kas, aarcange, raquini, anshuman.khandual, catalin.marinas, tiwai,
	will, dave.hansen, jack, cl, jglisse, surenb, zokeefe, hannes,
	rientjes, mhocko, rdunlap, hughd, richard.weiyang, lance.yang,
	vbabka, rppt, jannh, pfalcato

Hi Andrew,

The bot has reported a potential uninitialized use of a variable.

Can you please squash the following fixup to this commit.

Thank you,
Nico

----8<----

From 846f79d91a25ebad76cbab3690ae315cfe3cf278 Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Wed, 3 Dec 2025 13:42:18 -0700
Subject: [PATCH] khugepaged: fixup unintialized _pte variable

There is a potential use of an uninitialized variable after
`khugepaged: introduce collapse_max_ptes_none helper function`

Andrew can you please append this to Patch 7 of my series

as reported by the kernel test robot
>> mm/khugepaged.c:593:6: warning: variable '_pte' is used uninitialized
whenever 'if' condition is true [-Wsometimes-uninitialized]
593 | if (max_ptes_none == -EINVAL)
| ^~~~~~~~~~~~~~~~~~~~~~~~
mm/khugepaged.c:724:25: note: uninitialized use occurs here
724 | release_pte_pages(pte, _pte, compound_pagelist);
| ^~~~
mm/khugepaged.c:593:2: note: remove the 'if' if its condition is always false
593 | if (max_ptes_none == -EINVAL)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
594 | goto out;
| ~~~~~~~~
mm/khugepaged.c:588:13: note: initialize the variable '_pte' to silence this warning
588 | pte_t *_pte;
| ^
| = NULL
1 warning generated.

Signed-off-by: Nico Pache <npache@redhat.com>
---
mm/khugepaged.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f425238d5d4f..7c7d04d6737e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -585,7 +585,7 @@ static int __collapse_huge_page_isolate(struct
vm_area_struct *vma,
struct page *page = NULL;
struct folio *folio = NULL;
unsigned long addr = start_addr;
- pte_t *_pte;
+ pte_t *_pte = pte;
int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
const unsigned long nr_pages = 1UL << order;
int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
@@ -593,8 +593,7 @@ static int __collapse_huge_page_isolate(struct
vm_area_struct *vma,
if (max_ptes_none == -EINVAL)
goto out;
- for (_pte = pte; _pte < pte + nr_pages;
- _pte++, addr += PAGE_SIZE) {
+ for (; _pte < pte + nr_pages; _pte++, addr += PAGE_SIZE) {
pte_t pteval = ptep_get(_pte);
if (pte_none_or_zero(pteval)) {
++none_or_zero;

--
2.52.0

On 12/1/25 10:46 AM, Nico Pache wrote:
> The current mechanism for determining mTHP collapse scales the
> khugepaged_max_ptes_none value based on the target order. This
> introduces an undesirable feedback loop, or "creep", when max_ptes_none
> is set to a value greater than HPAGE_PMD_NR / 2.
> 
> With this configuration, a successful collapse to order N will populate
> enough pages to satisfy the collapse condition on order N+1 on the next
> scan. This leads to unnecessary work and memory churn.
> 
> To fix this issue introduce a helper function that will limit mTHP
> collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
> This effectively supports two modes:
> 
> - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
> - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
>   available mTHP order.
> 
> This removes the possiblilty of "creep", while not modifying any uAPI
> expectations. A warning will be emitted if any non-supported
> max_ptes_none value is configured with mTHP enabled.
> 
> The limits can be ignored by passing full_scan=true, this is useful for
> madvise_collapse (which ignores limits), or in the case of
> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> collapse is available.
> 
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8dab49c53128..f425238d5d4f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
>  		wake_up_interruptible(&khugepaged_wait);
>  }
>  
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> + * @order: The folio order being collapsed to
> + * @full_scan: Whether this is a full scan (ignore limits)
> + *
> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> + *
> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> + * khugepaged_max_ptes_none value.
> + *
> + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
> + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
> + * collapse will be attempted
> + *
> + * Return: Maximum number of empty PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> +{
> +	/* ignore max_ptes_none limits */
> +	if (full_scan)
> +		return HPAGE_PMD_NR - 1;
> +
> +	if (!is_mthp_order(order))
> +		return khugepaged_max_ptes_none;
> +
> +	/* Zero/non-present collapse disabled. */
> +	if (!khugepaged_max_ptes_none)
> +		return 0;
> +
> +	if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> +		return (1 << order) - 1;
> +
> +	pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
> +		      HPAGE_PMD_NR - 1);
> +	return -EINVAL;
> +}
> +
>  void khugepaged_enter_vma(struct vm_area_struct *vma,
>  			  vm_flags_t vm_flags)
>  {
> @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  	pte_t *_pte;
>  	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
>  	const unsigned long nr_pages = 1UL << order;
> -	int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> +	int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
> +
> +	if (max_ptes_none == -EINVAL)
> +		goto out;
>  
>  	for (_pte = pte; _pte < pte + nr_pages;
>  	     _pte++, addr += PAGE_SIZE) {


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
@ 2025-12-08 18:48 kernel test robot
  0 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-12-08 18:48 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20251201174627.23295-8-npache@redhat.com>
References: <20251201174627.23295-8-npache@redhat.com>
TO: Nico Pache <npache@redhat.com>

Hi Nico,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master next-20251208]
[cannot apply to v6.18]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Nico-Pache/khugepaged-rename-hpage_collapse_-to-collapse_/20251202-015150
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20251201174627.23295-8-npache%40redhat.com
patch subject: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
:::::: branch date: 7 days ago
:::::: commit date: 7 days ago
config: x86_64-randconfig-161-20251208 (https://download.01.org/0day-ci/archive/20251209/202512090239.PI28RhRo-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202512090239.PI28RhRo-lkp@intel.com/

smatch warnings:
mm/khugepaged.c:724 __collapse_huge_page_isolate() error: uninitialized symbol '_pte'.

vim +/_pte +724 mm/khugepaged.c

b46e756f5e4703 Kirill A. Shutemov    2016-07-26  580  
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  581  static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
ffd26dcc465d32 Nico Pache            2025-12-01  582  		unsigned long start_addr, pte_t *pte, struct collapse_control *cc,
ffd26dcc465d32 Nico Pache            2025-12-01  583  		unsigned int order, struct list_head *compound_pagelist)
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  584  {
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  585  	struct page *page = NULL;
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  586) 	struct folio *folio = NULL;
1acc369373008b Wei Yang              2025-09-22  587  	unsigned long addr = start_addr;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  588  	pte_t *_pte;
50ad2f24b3b48c Zach O'Keefe          2022-07-06  589  	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
ffd26dcc465d32 Nico Pache            2025-12-01  590  	const unsigned long nr_pages = 1UL << order;
04313aec37d2ac Nico Pache            2025-12-01  591  	int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
04313aec37d2ac Nico Pache            2025-12-01  592  
04313aec37d2ac Nico Pache            2025-12-01  593  	if (max_ptes_none == -EINVAL)
04313aec37d2ac Nico Pache            2025-12-01  594  		goto out;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  595  
ffd26dcc465d32 Nico Pache            2025-12-01  596  	for (_pte = pte; _pte < pte + nr_pages;
1acc369373008b Wei Yang              2025-09-22  597  	     _pte++, addr += PAGE_SIZE) {
c33c794828f212 Ryan Roberts          2023-06-12  598  		pte_t pteval = ptep_get(_pte);
074f027d15c10c Lance Yang            2025-10-20  599  		if (pte_none_or_zero(pteval)) {
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  600  			++none_or_zero;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  601  			if (!userfaultfd_armed(vma) &&
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  602  			    (!cc->is_khugepaged ||
ffd26dcc465d32 Nico Pache            2025-12-01  603  			     none_or_zero <= max_ptes_none)) {
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  604  				continue;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  605  			} else {
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  606  				result = SCAN_EXCEED_NONE_PTE;
e9ea874a8ffb0f Yang Yang             2022-01-14  607  				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  608  				goto out;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  609  			}
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  610  		}
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  611  		if (!pte_present(pteval)) {
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  612  			result = SCAN_PTE_NON_PRESENT;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  613  			goto out;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  614  		}
dd47ac428c3f5f Peter Xu              2023-04-05  615  		if (pte_uffd_wp(pteval)) {
dd47ac428c3f5f Peter Xu              2023-04-05  616  			result = SCAN_PTE_UFFD_WP;
dd47ac428c3f5f Peter Xu              2023-04-05  617  			goto out;
dd47ac428c3f5f Peter Xu              2023-04-05  618  		}
1acc369373008b Wei Yang              2025-09-22  619  		page = vm_normal_page(vma, addr, pteval);
3218f8712d6bba Alex Sierra           2022-07-15  620  		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  621  			result = SCAN_PAGE_NULL;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  622  			goto out;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  623  		}
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  624  
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  625) 		folio = page_folio(page);
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  626) 		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  627  
3f7fc2c6e09310 Nico Pache            2025-12-01  628  		/* See collapse_scan_pmd(). */
003fde4492c88a David Hildenbrand     2025-03-03  629  		if (folio_maybe_mapped_shared(folio)) {
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  630  			++shared;
ffd26dcc465d32 Nico Pache            2025-12-01  631  			/*
ffd26dcc465d32 Nico Pache            2025-12-01  632  			 * TODO: Support shared pages without leading to further
ffd26dcc465d32 Nico Pache            2025-12-01  633  			 * mTHP collapses. Currently bringing in new pages via
ffd26dcc465d32 Nico Pache            2025-12-01  634  			 * shared may cause a future higher order collapse on a
ffd26dcc465d32 Nico Pache            2025-12-01  635  			 * rescan of the same range.
ffd26dcc465d32 Nico Pache            2025-12-01  636  			 */
ffd26dcc465d32 Nico Pache            2025-12-01  637  			if (is_mthp_order(order) || (cc->is_khugepaged &&
ffd26dcc465d32 Nico Pache            2025-12-01  638  			    shared > khugepaged_max_ptes_shared)) {
71a2c112a0f6da Kirill A. Shutemov    2020-06-03  639  				result = SCAN_EXCEED_SHARED_PTE;
e9ea874a8ffb0f Yang Yang             2022-01-14  640  				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
71a2c112a0f6da Kirill A. Shutemov    2020-06-03  641  				goto out;
71a2c112a0f6da Kirill A. Shutemov    2020-06-03  642  			}
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  643  		}
71a2c112a0f6da Kirill A. Shutemov    2020-06-03  644  
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  645) 		if (folio_test_large(folio)) {
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  646) 			struct folio *f;
fece2029a9e65b Kirill A. Shutemov    2018-03-22  647  
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  648  			/*
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  649  			 * Check if we have dealt with the compound page
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  650  			 * already
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  651  			 */
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  652) 			list_for_each_entry(f, compound_pagelist, lru) {
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  653) 				if (folio == f)
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  654  					goto next;
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  655  			}
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  656  		}
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  657  
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  658  		/*
775d28fd45a2f5 Kefeng Wang           2024-08-26  659  		 * We can do it before folio_isolate_lru because the
775d28fd45a2f5 Kefeng Wang           2024-08-26  660  		 * folio can't be freed from under us. NOTE: PG_lock
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  661  		 * is needed to serialize against split_huge_page
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  662  		 * when invoked from the VM.
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  663  		 */
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  664) 		if (!folio_trylock(folio)) {
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  665  			result = SCAN_PAGE_LOCK;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  666  			goto out;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  667  		}
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  668  
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  669  		/*
9445689f3b6170 Kirill A. Shutemov    2020-06-03  670  		 * Check if the page has any GUP (or other external) pins.
9445689f3b6170 Kirill A. Shutemov    2020-06-03  671  		 *
9445689f3b6170 Kirill A. Shutemov    2020-06-03  672  		 * The page table that maps the page has been already unlinked
9445689f3b6170 Kirill A. Shutemov    2020-06-03  673  		 * from the page table tree and this process cannot get
f0953a1bbaca71 Ingo Molnar           2021-05-06  674  		 * an additional pin on the page.
9445689f3b6170 Kirill A. Shutemov    2020-06-03  675  		 *
9445689f3b6170 Kirill A. Shutemov    2020-06-03  676  		 * New pins can come later if the page is shared across fork,
9445689f3b6170 Kirill A. Shutemov    2020-06-03  677  		 * but not from this process. The other process cannot write to
9445689f3b6170 Kirill A. Shutemov    2020-06-03  678  		 * the page, only trigger CoW.
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  679  		 */
0b43b8bc8ef88b Shivank Garg          2025-05-26  680  		if (folio_expected_ref_count(folio) != folio_ref_count(folio)) {
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  681) 			folio_unlock(folio);
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  682  			result = SCAN_PAGE_COUNT;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  683  			goto out;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  684  		}
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  685  
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  686  		/*
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  687  		 * Isolate the page to avoid collapsing an hugepage
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  688  		 * currently in use by the VM.
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  689  		 */
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  690) 		if (!folio_isolate_lru(folio)) {
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  691) 			folio_unlock(folio);
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  692  			result = SCAN_DEL_PAGE_LRU;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  693  			goto out;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  694  		}
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  695) 		node_stat_mod_folio(folio,
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  696) 				NR_ISOLATED_ANON + folio_is_file_lru(folio),
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  697) 				folio_nr_pages(folio));
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  698) 		VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  699) 		VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  700  
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  701) 		if (folio_test_large(folio))
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  702) 			list_add_tail(&folio->lru, compound_pagelist);
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03  703  next:
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  704  		/*
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  705  		 * If collapse was initiated by khugepaged, check that there is
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  706  		 * enough young pte to justify collapsing the page
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  707  		 */
d8ea7cc8547ca3 Zach O'Keefe          2022-07-06  708  		if (cc->is_khugepaged &&
8dd1e896735f6e Vishal Moola (Oracle  2023-10-20  709) 		    (pte_young(pteval) || folio_test_young(folio) ||
1acc369373008b Wei Yang              2025-09-22  710  		     folio_test_referenced(folio) ||
1acc369373008b Wei Yang              2025-09-22  711  		     mmu_notifier_test_young(vma->vm_mm, addr)))
0db501f7a34c11 Ebru Akagunduz        2016-07-26  712  			referenced++;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  713  	}
74e579bf231a33 Miaohe Lin            2021-05-04  714  
62b98015d98815 Dev Jain              2025-09-08  715  	if (unlikely(cc->is_khugepaged && !referenced)) {
74e579bf231a33 Miaohe Lin            2021-05-04  716  		result = SCAN_LACK_REFERENCED_PAGE;
74e579bf231a33 Miaohe Lin            2021-05-04  717  	} else {
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  718  		result = SCAN_SUCCEED;
50dbe531291abf Fan Ni                2025-04-24  719  		trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
473b73222f3d8c Dev Jain              2025-09-08  720  						    referenced, result);
50ad2f24b3b48c Zach O'Keefe          2022-07-06  721  		return result;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  722  	}
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  723  out:
5503fbf2b0b80c Kirill A. Shutemov    2020-06-03 @724  	release_pte_pages(pte, _pte, compound_pagelist);
50dbe531291abf Fan Ni                2025-04-24  725  	trace_mm_collapse_huge_page_isolate(folio, none_or_zero,
473b73222f3d8c Dev Jain              2025-09-08  726  					    referenced, result);
50ad2f24b3b48c Zach O'Keefe          2022-07-06  727  	return result;
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  728  }
b46e756f5e4703 Kirill A. Shutemov    2016-07-26  729  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
  2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
                     ` (2 preceding siblings ...)
  2025-12-03 21:02   ` Nico Pache
@ 2025-12-16  8:12   ` Baolin Wang
  2025-12-16 23:26     ` Nico Pache
  3 siblings, 1 reply; 8+ messages in thread
From: Baolin Wang @ 2025-12-16  8:12 UTC (permalink / raw)
  To: Nico Pache, linux-kernel, linux-trace-kernel, linux-mm, linux-doc
  Cc: david, ziy, lorenzo.stoakes, Liam.Howlett, ryan.roberts, dev.jain,
	corbet, rostedt, mhiramat, mathieu.desnoyers, akpm, baohua, willy,
	peterx, wangkefeng.wang, usamaarif642, sunnanyong, vishal.moola,
	thomas.hellstrom, yang, kas, aarcange, raquini, anshuman.khandual,
	catalin.marinas, tiwai, will, dave.hansen, jack, cl, jglisse,
	surenb, zokeefe, hannes, rientjes, mhocko, rdunlap, hughd,
	richard.weiyang, lance.yang, vbabka, rppt, jannh, pfalcato

Hi Nico,

On 2025/12/2 01:46, Nico Pache wrote:
> The current mechanism for determining mTHP collapse scales the
> khugepaged_max_ptes_none value based on the target order. This
> introduces an undesirable feedback loop, or "creep", when max_ptes_none
> is set to a value greater than HPAGE_PMD_NR / 2.
> 
> With this configuration, a successful collapse to order N will populate
> enough pages to satisfy the collapse condition on order N+1 on the next
> scan. This leads to unnecessary work and memory churn.
> 
> To fix this issue introduce a helper function that will limit mTHP
> collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
> This effectively supports two modes:
> 
> - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
> - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
>    available mTHP order.
> 
> This removes the possiblilty of "creep", while not modifying any uAPI
> expectations. A warning will be emitted if any non-supported
> max_ptes_none value is configured with mTHP enabled.
> 
> The limits can be ignored by passing full_scan=true, this is useful for
> madvise_collapse (which ignores limits), or in the case of
> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> collapse is available.
> 
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>   mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8dab49c53128..f425238d5d4f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
>   		wake_up_interruptible(&khugepaged_wait);
>   }
>   
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> + * @order: The folio order being collapsed to
> + * @full_scan: Whether this is a full scan (ignore limits)
> + *
> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> + *
> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> + * khugepaged_max_ptes_none value.
> + *
> + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
> + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
> + * collapse will be attempted
> + *
> + * Return: Maximum number of empty PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> +{
> +	/* ignore max_ptes_none limits */
> +	if (full_scan)
> +		return HPAGE_PMD_NR - 1;
> +
> +	if (!is_mthp_order(order))
> +		return khugepaged_max_ptes_none;
> +
> +	/* Zero/non-present collapse disabled. */
> +	if (!khugepaged_max_ptes_none)
> +		return 0;
> +
> +	if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> +		return (1 << order) - 1;
> +
> +	pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
> +		      HPAGE_PMD_NR - 1);
> +	return -EINVAL;
> +}
> +
>   void khugepaged_enter_vma(struct vm_area_struct *vma,
>   			  vm_flags_t vm_flags)
>   {
> @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>   	pte_t *_pte;
>   	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
>   	const unsigned long nr_pages = 1UL << order;
> -	int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> +	int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
> +
> +	if (max_ptes_none == -EINVAL)
> +		goto out;

After testing your patchset, I hit the following crash. The reason is 
that when 'max_ptes_none' is -EINVAL here, it shouldn't goto out to call 
release_pte_pages(), because the '_pte' hasn't been initialized at this 
point, and there's no need to release folios either.

After applying the fix below, the crash issue is resolved. I'm not sure 
whether Andrew will help fix this or if you will send a new version to 
address this issue.

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8cffaf59ced8..2e8171a6d7df 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -646,7 +646,7 @@ static int __collapse_huge_page_isolate(struct 
vm_area_struct *vma,
         int max_ptes_none = collapse_max_ptes_none(order, 
!cc->is_khugepaged);

         if (max_ptes_none == -EINVAL)
-               goto out;
+               return result;

         for (_pte = pte; _pte < pte + nr_pages;
              _pte++, addr += PAGE_SIZE) {

"
[  565.319345] Unable to handle kernel paging request at virtual address 
fffffffffffffffa
.......
[  565.319409] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001f8549a000
[  565.319416] [fffffffffffffffa] pgd=0000001f85f2a403, 
p4d=0000001f85f2a403, pud=0000001f85f2b403, pmd=0000000000000000
[  565.319427] Internal error: Oops: 0000000096000006 [#1]  SMP
.......
[  565.326733] pc : release_pte_pages+0x68/0x178
[  565.326960] lr : __collapse_huge_page_isolate+0xc0/0x748
[  565.327232] sp : ffff800083593910
.......
[  565.331476] Call trace:
[  565.331664]  release_pte_pages+0x68/0x178 (P)
[  565.331940]  __collapse_huge_page_isolate+0xc0/0x748
[  565.332249]  collapse_huge_page+0x4cc/0xa70
[  565.332510]  mthp_collapse+0x254/0x2a8
[  565.332754]  collapse_scan_pmd+0x5a0/0x6d8
[  565.333010]  collapse_single_pmd+0x214/0x288
[  565.333275]  collapse_scan_mm_slot.constprop.0+0x2ac/0x460
[  565.333617]  khugepaged+0x204/0x2c8
[  565.333992]  kthread+0xf8/0x110
[  565.334368]  ret_from_fork+0x10/0x20
"

>   
>   	for (_pte = pte; _pte < pte + nr_pages;
>   	     _pte++, addr += PAGE_SIZE) {


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
  2025-12-16  8:12   ` Baolin Wang
@ 2025-12-16 23:26     ` Nico Pache
  2025-12-17  1:33       ` Baolin Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Nico Pache @ 2025-12-16 23:26 UTC (permalink / raw)
  To: Baolin Wang
  Cc: linux-kernel, linux-trace-kernel, linux-mm, linux-doc, david, ziy,
	lorenzo.stoakes, Liam.Howlett, ryan.roberts, dev.jain, corbet,
	rostedt, mhiramat, mathieu.desnoyers, akpm, baohua, willy, peterx,
	wangkefeng.wang, usamaarif642, sunnanyong, vishal.moola,
	thomas.hellstrom, yang, kas, aarcange, raquini, anshuman.khandual,
	catalin.marinas, tiwai, will, dave.hansen, jack, cl, jglisse,
	surenb, zokeefe, hannes, rientjes, mhocko, rdunlap, hughd,
	richard.weiyang, lance.yang, vbabka, rppt, jannh, pfalcato

On Tue, Dec 16, 2025 at 1:12 AM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> Hi Nico,

Hi Baolin! Thanks for testing :)

Did you happen to test with the changes I asked Andrew to append to
this commit?

Either way, I think your fixup makes more sense than mine.

Cheers,
-- Nico


>
> On 2025/12/2 01:46, Nico Pache wrote:
> > The current mechanism for determining mTHP collapse scales the
> > khugepaged_max_ptes_none value based on the target order. This
> > introduces an undesirable feedback loop, or "creep", when max_ptes_none
> > is set to a value greater than HPAGE_PMD_NR / 2.
> >
> > With this configuration, a successful collapse to order N will populate
> > enough pages to satisfy the collapse condition on order N+1 on the next
> > scan. This leads to unnecessary work and memory churn.
> >
> > To fix this issue introduce a helper function that will limit mTHP
> > collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
> > This effectively supports two modes:
> >
> > - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
> > - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
> >    available mTHP order.
> >
> > This removes the possiblilty of "creep", while not modifying any uAPI
> > expectations. A warning will be emitted if any non-supported
> > max_ptes_none value is configured with mTHP enabled.
> >
> > The limits can be ignored by passing full_scan=true, this is useful for
> > madvise_collapse (which ignores limits), or in the case of
> > collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
> > collapse is available.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >   mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
> >   1 file changed, 42 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 8dab49c53128..f425238d5d4f 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
> >               wake_up_interruptible(&khugepaged_wait);
> >   }
> >
> > +/**
> > + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
> > + * @order: The folio order being collapsed to
> > + * @full_scan: Whether this is a full scan (ignore limits)
> > + *
> > + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
> > + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
> > + *
> > + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
> > + * khugepaged_max_ptes_none value.
> > + *
> > + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
> > + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
> > + * collapse will be attempted
> > + *
> > + * Return: Maximum number of empty PTEs allowed for the collapse operation
> > + */
> > +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
> > +{
> > +     /* ignore max_ptes_none limits */
> > +     if (full_scan)
> > +             return HPAGE_PMD_NR - 1;
> > +
> > +     if (!is_mthp_order(order))
> > +             return khugepaged_max_ptes_none;
> > +
> > +     /* Zero/non-present collapse disabled. */
> > +     if (!khugepaged_max_ptes_none)
> > +             return 0;
> > +
> > +     if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> > +             return (1 << order) - 1;
> > +
> > +     pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
> > +                   HPAGE_PMD_NR - 1);
> > +     return -EINVAL;
> > +}
> > +
> >   void khugepaged_enter_vma(struct vm_area_struct *vma,
> >                         vm_flags_t vm_flags)
> >   {
> > @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >       pte_t *_pte;
> >       int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> >       const unsigned long nr_pages = 1UL << order;
> > -     int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
> > +     int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
> > +
> > +     if (max_ptes_none == -EINVAL)
> > +             goto out;
>
> After testing your patchset, I hit the following crash. The reason is
> that when 'max_ptes_none' is -EINVAL here, it shouldn't goto out to call
> release_pte_pages(), because the '_pte' hasn't been initialized at this
> point, and there's no need to release folios either.
>
> After applying the fix below, the crash issue is resolved. I'm not sure
> whether Andrew will help fix this or if you will send a new version to
> address this issue.
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8cffaf59ced8..2e8171a6d7df 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -646,7 +646,7 @@ static int __collapse_huge_page_isolate(struct
> vm_area_struct *vma,
>          int max_ptes_none = collapse_max_ptes_none(order,
> !cc->is_khugepaged);
>
>          if (max_ptes_none == -EINVAL)
> -               goto out;
> +               return result;
>
>          for (_pte = pte; _pte < pte + nr_pages;
>               _pte++, addr += PAGE_SIZE) {
>
> "
> [  565.319345] Unable to handle kernel paging request at virtual address
> fffffffffffffffa
> .......
> [  565.319409] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001f8549a000
> [  565.319416] [fffffffffffffffa] pgd=0000001f85f2a403,
> p4d=0000001f85f2a403, pud=0000001f85f2b403, pmd=0000000000000000
> [  565.319427] Internal error: Oops: 0000000096000006 [#1]  SMP
> .......
> [  565.326733] pc : release_pte_pages+0x68/0x178
> [  565.326960] lr : __collapse_huge_page_isolate+0xc0/0x748
> [  565.327232] sp : ffff800083593910
> .......
> [  565.331476] Call trace:
> [  565.331664]  release_pte_pages+0x68/0x178 (P)
> [  565.331940]  __collapse_huge_page_isolate+0xc0/0x748
> [  565.332249]  collapse_huge_page+0x4cc/0xa70
> [  565.332510]  mthp_collapse+0x254/0x2a8
> [  565.332754]  collapse_scan_pmd+0x5a0/0x6d8
> [  565.333010]  collapse_single_pmd+0x214/0x288
> [  565.333275]  collapse_scan_mm_slot.constprop.0+0x2ac/0x460
> [  565.333617]  khugepaged+0x204/0x2c8
> [  565.333992]  kthread+0xf8/0x110
> [  565.334368]  ret_from_fork+0x10/0x20
> "
>
> >
> >       for (_pte = pte; _pte < pte + nr_pages;
> >            _pte++, addr += PAGE_SIZE) {
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function
  2025-12-16 23:26     ` Nico Pache
@ 2025-12-17  1:33       ` Baolin Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Baolin Wang @ 2025-12-17  1:33 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-kernel, linux-trace-kernel, linux-mm, linux-doc, david, ziy,
	lorenzo.stoakes, Liam.Howlett, ryan.roberts, dev.jain, corbet,
	rostedt, mhiramat, mathieu.desnoyers, akpm, baohua, willy, peterx,
	wangkefeng.wang, usamaarif642, sunnanyong, vishal.moola,
	thomas.hellstrom, yang, kas, aarcange, raquini, anshuman.khandual,
	catalin.marinas, tiwai, will, dave.hansen, jack, cl, jglisse,
	surenb, zokeefe, hannes, rientjes, mhocko, rdunlap, hughd,
	richard.weiyang, lance.yang, vbabka, rppt, jannh, pfalcato



On 2025/12/17 07:26, Nico Pache wrote:
> On Tue, Dec 16, 2025 at 1:12 AM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>> Hi Nico,
> 
> Hi Baolin! Thanks for testing :)
> 
> Did you happen to test with the changes I asked Andrew to append to
> this commit?
> 
> Either way, I think your fixup makes more sense than mine.

Ah, I did not notice your fixup commit earlier, which seems to address 
this problem. And as I said, we don't need to call release_pte_pages() 
in this case.

>> On 2025/12/2 01:46, Nico Pache wrote:
>>> The current mechanism for determining mTHP collapse scales the
>>> khugepaged_max_ptes_none value based on the target order. This
>>> introduces an undesirable feedback loop, or "creep", when max_ptes_none
>>> is set to a value greater than HPAGE_PMD_NR / 2.
>>>
>>> With this configuration, a successful collapse to order N will populate
>>> enough pages to satisfy the collapse condition on order N+1 on the next
>>> scan. This leads to unnecessary work and memory churn.
>>>
>>> To fix this issue introduce a helper function that will limit mTHP
>>> collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
>>> This effectively supports two modes:
>>>
>>> - max_ptes_none=0: never introduce new none-pages for mTHP collapse.
>>> - max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
>>>     available mTHP order.
>>>
>>> This removes the possiblilty of "creep", while not modifying any uAPI
>>> expectations. A warning will be emitted if any non-supported
>>> max_ptes_none value is configured with mTHP enabled.
>>>
>>> The limits can be ignored by passing full_scan=true, this is useful for
>>> madvise_collapse (which ignores limits), or in the case of
>>> collapse_scan_pmd(), allows the full PMD to be scanned when mTHP
>>> collapse is available.
>>>
>>> Signed-off-by: Nico Pache <npache@redhat.com>
>>> ---
>>>    mm/khugepaged.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
>>>    1 file changed, 42 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 8dab49c53128..f425238d5d4f 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -463,6 +463,44 @@ void __khugepaged_enter(struct mm_struct *mm)
>>>                wake_up_interruptible(&khugepaged_wait);
>>>    }
>>>
>>> +/**
>>> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse
>>> + * @order: The folio order being collapsed to
>>> + * @full_scan: Whether this is a full scan (ignore limits)
>>> + *
>>> + * For madvise-triggered collapses (full_scan=true), all limits are bypassed
>>> + * and allow up to HPAGE_PMD_NR - 1 empty PTEs.
>>> + *
>>> + * For PMD-sized collapses (order == HPAGE_PMD_ORDER), use the configured
>>> + * khugepaged_max_ptes_none value.
>>> + *
>>> + * For mTHP collapses, we currently only support khugepaged_max_pte_none values
>>> + * of 0 or (HPAGE_PMD_NR - 1). Any other value will emit a warning and no mTHP
>>> + * collapse will be attempted
>>> + *
>>> + * Return: Maximum number of empty PTEs allowed for the collapse operation
>>> + */
>>> +static unsigned int collapse_max_ptes_none(unsigned int order, bool full_scan)
>>> +{
>>> +     /* ignore max_ptes_none limits */
>>> +     if (full_scan)
>>> +             return HPAGE_PMD_NR - 1;
>>> +
>>> +     if (!is_mthp_order(order))
>>> +             return khugepaged_max_ptes_none;
>>> +
>>> +     /* Zero/non-present collapse disabled. */
>>> +     if (!khugepaged_max_ptes_none)
>>> +             return 0;
>>> +
>>> +     if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
>>> +             return (1 << order) - 1;
>>> +
>>> +     pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %d\n",
>>> +                   HPAGE_PMD_NR - 1);
>>> +     return -EINVAL;
>>> +}
>>> +
>>>    void khugepaged_enter_vma(struct vm_area_struct *vma,
>>>                          vm_flags_t vm_flags)
>>>    {
>>> @@ -550,7 +588,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>>>        pte_t *_pte;
>>>        int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
>>>        const unsigned long nr_pages = 1UL << order;
>>> -     int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
>>> +     int max_ptes_none = collapse_max_ptes_none(order, !cc->is_khugepaged);
>>> +
>>> +     if (max_ptes_none == -EINVAL)
>>> +             goto out;
>>
>> After testing your patchset, I hit the following crash. The reason is
>> that when 'max_ptes_none' is -EINVAL here, it shouldn't goto out to call
>> release_pte_pages(), because the '_pte' hasn't been initialized at this
>> point, and there's no need to release folios either.
>>
>> After applying the fix below, the crash issue is resolved. I'm not sure
>> whether Andrew will help fix this or if you will send a new version to
>> address this issue.
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 8cffaf59ced8..2e8171a6d7df 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -646,7 +646,7 @@ static int __collapse_huge_page_isolate(struct
>> vm_area_struct *vma,
>>           int max_ptes_none = collapse_max_ptes_none(order,
>> !cc->is_khugepaged);
>>
>>           if (max_ptes_none == -EINVAL)
>> -               goto out;
>> +               return result;
>>
>>           for (_pte = pte; _pte < pte + nr_pages;
>>                _pte++, addr += PAGE_SIZE) {
>>
>> "
>> [  565.319345] Unable to handle kernel paging request at virtual address
>> fffffffffffffffa
>> .......
>> [  565.319409] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001f8549a000
>> [  565.319416] [fffffffffffffffa] pgd=0000001f85f2a403,
>> p4d=0000001f85f2a403, pud=0000001f85f2b403, pmd=0000000000000000
>> [  565.319427] Internal error: Oops: 0000000096000006 [#1]  SMP
>> .......
>> [  565.326733] pc : release_pte_pages+0x68/0x178
>> [  565.326960] lr : __collapse_huge_page_isolate+0xc0/0x748
>> [  565.327232] sp : ffff800083593910
>> .......
>> [  565.331476] Call trace:
>> [  565.331664]  release_pte_pages+0x68/0x178 (P)
>> [  565.331940]  __collapse_huge_page_isolate+0xc0/0x748
>> [  565.332249]  collapse_huge_page+0x4cc/0xa70
>> [  565.332510]  mthp_collapse+0x254/0x2a8
>> [  565.332754]  collapse_scan_pmd+0x5a0/0x6d8
>> [  565.333010]  collapse_single_pmd+0x214/0x288
>> [  565.333275]  collapse_scan_mm_slot.constprop.0+0x2ac/0x460
>> [  565.333617]  khugepaged+0x204/0x2c8
>> [  565.333992]  kthread+0xf8/0x110
>> [  565.334368]  ret_from_fork+0x10/0x20
>> "
>>
>>>
>>>        for (_pte = pte; _pte < pte + nr_pages;
>>>             _pte++, addr += PAGE_SIZE) {
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-17  1:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-08 18:48 [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2025-12-01 17:46 [PATCH v13 mm-new 00/16] khugepaged: mTHP support Nico Pache
2025-12-01 17:46 ` [PATCH v13 mm-new 07/16] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
2025-12-02  7:53   ` Baolin Wang
2025-12-03 13:40   ` kernel test robot
2025-12-03 21:02   ` Nico Pache
2025-12-16  8:12   ` Baolin Wang
2025-12-16 23:26     ` Nico Pache
2025-12-17  1:33       ` Baolin Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.