[RFC] mm: restrict zero-page remapping to underused THP splits

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [RFC] mm: restrict zero-page remapping to underused THP splits
@ 2026-05-08 17:05 Nico Pache
  2026-05-08 21:32 ` David Hildenbrand (Arm)
  2026-05-09  3:21 ` Lance Yang
  0 siblings, 2 replies; 12+ messages in thread
From: Nico Pache @ 2026-05-08 17:05 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: yuzhao, usamaarif642, lance.yang, baohua, dev.jain, ryan.roberts,
	liam, baolin.wang, ziy, ljs, david, akpm, Nico Pache

Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
when splitting isolated thp"), splitting an anonymous THP remaps all
zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
This flag is set unconditionally for every anonymous folio split,
including splits triggered by KSM.

When KSM is enabled with THP=always, this causes two regressions:

1. use_zero_pages=1: KSM calls try_to_merge_one_page() which triggers
   split_huge_page(). The split remaps all 512 zero-filled subpages to
   the shared zeropage at once, freeing the entire 2MB THP when KSM only
   intended to process a single 4KB page. This bypasses KSM's
   pages_to_scan rate limiting, causing ~1GB to be freed almost
   instantly.

2. use_zero_pages=0: The same split side-effect occurs through the
   stable/unstable tree merge paths. Each pages_to_scan iteration
   triggers an expensive split_huge_page() that silently frees 2MB,
   while the scanner wastes cycles on tree searches for zero-filled
   pages that were already freed as a side-effect.

Fix this by restricting TTU_USE_SHARED_ZEROPAGE to only the deferred
split shrinker path (deferred_split_scan), which is the only caller that
intentionally splits underused THPs to reclaim zero-filled subpages.
Introduce folio_split_underused() as a dedicated entry point that
passes is_underused_thp=true through __folio_split(), and use it from
deferred_split_scan(). All other split callers (KSM, compaction, etc.)
no longer get the zero-page remapping side-effect.

Reviewers notes: this patch is one of two potential approaches. This patch
turns off the zero-page freeing that has been done since the noted commit,
in all the other callers, only leaving the underused shrinker to do such
behavior. We can also take the opposite approach of with something like
split_huge_page_no_zeropage() and call this within KSM.

Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Nico Pache <npache@redhat.com>
---
 include/linux/huge_mm.h |  2 +-
 mm/huge_memory.c        | 17 ++++++++++++-----
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2949e5acff35..4ae1b52d7411 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -378,7 +378,7 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
 			   enum split_type split_type);
 int folio_split(struct folio *folio, unsigned int new_order, struct page *page,
 		struct list_head *list);
-
+int folio_split_underused(struct folio *folio);
 static inline int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 		unsigned int new_order)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..91f7fad72c8a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4045,7 +4045,8 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
  */
 static int __folio_split(struct folio *folio, unsigned int new_order,
 		struct page *split_at, struct page *lock_at,
-		struct list_head *list, enum split_type split_type)
+		struct list_head *list, enum split_type split_type,
+		bool is_underused_thp)
 {
 	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
 	struct folio *end_folio = folio_next(folio);
@@ -4174,7 +4175,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 	if (nr_shmem_dropped)
 		shmem_uncharge(mapping->host, nr_shmem_dropped);

-	if (!ret && is_anon && !folio_is_device_private(folio))
+	if (!ret && is_anon && !folio_is_device_private(folio) && is_underused_thp)
 		ttu_flags = TTU_USE_SHARED_ZEROPAGE;

 	remap_page(folio, 1 << old_order, ttu_flags);
@@ -4309,7 +4310,7 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
 	struct folio *folio = page_folio(page);

 	return __folio_split(folio, new_order, &folio->page, page, list,
-			     SPLIT_TYPE_UNIFORM);
+			     SPLIT_TYPE_UNIFORM, false);
 }

 /**
@@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
 		struct page *split_at, struct list_head *list)
 {
 	return __folio_split(folio, new_order, split_at, &folio->page, list,
-			     SPLIT_TYPE_NON_UNIFORM);
+			     SPLIT_TYPE_NON_UNIFORM, false);
+}
+
+int folio_split_underused(struct folio *folio)
+{
+	return __folio_split(folio, 0, &folio->page, &folio->page,
+			     NULL, SPLIT_TYPE_NON_UNIFORM, true);
 }

 /**
@@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 		}
 		if (!folio_trylock(folio))
 			goto requeue;
-		if (!split_folio(folio)) {
+		if (!folio_split_underused(folio)) {
 			did_split = true;
 			if (underused)
 				count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-08 17:05 [RFC] mm: restrict zero-page remapping to underused THP splits Nico Pache
@ 2026-05-08 21:32 ` David Hildenbrand (Arm)
  2026-05-09  8:25   ` Lance Yang
                     ` (2 more replies)
  2026-05-09  3:21 ` Lance Yang
  1 sibling, 3 replies; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-08 21:32 UTC (permalink / raw)
  To: Nico Pache, linux-mm, linux-kernel
  Cc: yuzhao, usamaarif642, lance.yang, baohua, dev.jain, ryan.roberts,
	liam, baolin.wang, ziy, ljs, akpm

On 5/8/26 19:05, Nico Pache wrote:
> Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
> when splitting isolated thp"), splitting an anonymous THP remaps all
> zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
> This flag is set unconditionally for every anonymous folio split,
> including splits triggered by KSM.

And even when the underused scanner is effectively disabled on a system. Hm.

I don't quite like that we scan for zeropages when nobody even requested us to
split because of zeropages.

I can see why we would want to scan for zeropages in a setup where the underused
scanner is active, even when the split was triggered by someone/something else
(below).

[...]

>  /**
> @@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
>  		struct page *split_at, struct list_head *list)
>  {
>  	return __folio_split(folio, new_order, split_at, &folio->page, list,
> -			     SPLIT_TYPE_NON_UNIFORM);
> +			     SPLIT_TYPE_NON_UNIFORM, false);
> +}
> +
> +int folio_split_underused(struct folio *folio)
> +{
> +	return __folio_split(folio, 0, &folio->page, &folio->page,
> +			     NULL, SPLIT_TYPE_NON_UNIFORM, true);
>  }
>  
>  /**
> @@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>  		}
>  		if (!folio_trylock(folio))
>  			goto requeue;
> -		if (!split_folio(folio)) {
> +		if (!folio_split_underused(folio)) {
>  			did_split = true;
>  			if (underused)
>  				count_vm_event(THP_UNDERUSED_SPLIT_PAGE);

In general, this looks clean.

But imagine the following: someone splits the THP for another reason: for
example, because migration is unable to allocate a 2M THP, or because we have to
split on swapout etc.

Not freeing the zero-filled pages means that these pages cannot be reclaimed
anymore easily. We split a possibly underused THP but didn't free the memory.

The only way to free the memory would be to wait for another collapse, and then
have the new THP be detected as underused.

Hm.

(1) As you say, the alternative is to let KSM say that it wants to handle the
zero-filled pages itself. I'm not a the biggest fan of that approach. We still
have two mechanisms interacting to some degree.

(2) Another approach is to just let KSM handle this in VMAs that are marked as
mergable while KSM is active. That is, we check for VM_MERGABLE and ksm_run ==
KSM_RUN_MERGE in try_to_map_unused_to_zeropage() to just let KSM do its thing.

That really just stops both mechanisms from interacting.

(3) Yet another approach I could think of (in general) is to disable the
underused handling in a system where the underused splitting is entirely disabled.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e9d499da0ac7..5eca99271957 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -82,6 +82,14 @@ unsigned long huge_anon_orders_madvise __read_mostly;
 unsigned long huge_anon_orders_inherit __read_mostly;
 static bool anon_orders_configured __initdata;

+static bool thp_underused_split_active(void)
+{
+       if (!split_underused_thp)
+               return false;
+
+       return khugepaged_max_ptes_none != HPAGE_PMD_NR - 1;
+}
+
 static inline bool file_thp_enabled(struct vm_area_struct *vma)
 {
        struct inode *inode;
@@ -4188,7 +4196,8 @@ static int __folio_split(struct folio *folio, unsigned int
new_order,
        if (nr_shmem_dropped)
                shmem_uncharge(mapping->host, nr_shmem_dropped);

-       if (!ret && is_anon && !folio_is_device_private(folio))
+       if (!ret && is_anon && !folio_is_device_private(folio) &&
+           thp_underused_split_active())
                ttu_flags = TTU_USE_SHARED_ZEROPAGE;

        remap_page(folio, 1 << old_order, ttu_flags);
@@ -4497,7 +4506,7 @@ static bool thp_underused(struct folio *folio)
        int num_zero_pages = 0, num_filled_pages = 0;
        int i;

-       if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
+       if (!thp_underused_split_active())
                return false;

        if (folio_contain_hwpoisoned_page(folio))

I tend to like (2), and maybe (3) on top. Opinions?

-- 
Cheers,

David

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-08 17:05 [RFC] mm: restrict zero-page remapping to underused THP splits Nico Pache
  2026-05-08 21:32 ` David Hildenbrand (Arm)
@ 2026-05-09  3:21 ` Lance Yang
  2026-05-11 18:42   ` Nico Pache
  1 sibling, 1 reply; 12+ messages in thread
From: Lance Yang @ 2026-05-09  3:21 UTC (permalink / raw)
  To: npache
  Cc: linux-mm, linux-kernel, yuzhao, usamaarif642, lance.yang, baohua,
	dev.jain, ryan.roberts, liam, baolin.wang, ziy, ljs, david, akpm


On Fri, May 08, 2026 at 11:05:09AM -0600, Nico Pache wrote:
>Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
>when splitting isolated thp"), splitting an anonymous THP remaps all
>zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
>This flag is set unconditionally for every anonymous folio split,
>including splits triggered by KSM.
>
>When KSM is enabled with THP=always, this causes two regressions:
>
>1. use_zero_pages=1: KSM calls try_to_merge_one_page() which triggers
>   split_huge_page(). The split remaps all 512 zero-filled subpages to
>   the shared zeropage at once, freeing the entire 2MB THP when KSM only
>   intended to process a single 4KB page. This bypasses KSM's
>   pages_to_scan rate limiting, causing ~1GB to be freed almost
>   instantly.
>
>2. use_zero_pages=0: The same split side-effect occurs through the
>   stable/unstable tree merge paths. Each pages_to_scan iteration
>   triggers an expensive split_huge_page() that silently frees 2MB,
>   while the scanner wastes cycles on tree searches for zero-filled
>   pages that were already freed as a side-effect.
>
>Fix this by restricting TTU_USE_SHARED_ZEROPAGE to only the deferred
>split shrinker path (deferred_split_scan), which is the only caller that
>intentionally splits underused THPs to reclaim zero-filled subpages.
>Introduce folio_split_underused() as a dedicated entry point that
>passes is_underused_thp=true through __folio_split(), and use it from
>deferred_split_scan(). All other split callers (KSM, compaction, etc.)
>no longer get the zero-page remapping side-effect.
>
>Reviewers notes: this patch is one of two potential approaches. This patch
>turns off the zero-page freeing that has been done since the noted commit,
>in all the other callers, only leaving the underused shrinker to do such
>behavior. We can also take the opposite approach of with something like
>split_huge_page_no_zeropage() and call this within KSM.
>
>Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
>Signed-off-by: Nico Pache <npache@redhat.com>
>---
> include/linux/huge_mm.h |  2 +-
> mm/huge_memory.c        | 17 ++++++++++++-----
> 2 files changed, 13 insertions(+), 6 deletions(-)
>
>diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>index 2949e5acff35..4ae1b52d7411 100644
>--- a/include/linux/huge_mm.h
>+++ b/include/linux/huge_mm.h
>@@ -378,7 +378,7 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
> 			   enum split_type split_type);
> int folio_split(struct folio *folio, unsigned int new_order, struct page *page,
> 		struct list_head *list);
>-
>+int folio_split_underused(struct folio *folio);
> static inline int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> 		unsigned int new_order)
> {
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index 970e077019b7..91f7fad72c8a 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -4045,7 +4045,8 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>  */
> static int __folio_split(struct folio *folio, unsigned int new_order,
> 		struct page *split_at, struct page *lock_at,
>-		struct list_head *list, enum split_type split_type)
>+		struct list_head *list, enum split_type split_type,
>+		bool is_underused_thp)
> {
> 	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
> 	struct folio *end_folio = folio_next(folio);
>@@ -4174,7 +4175,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> 	if (nr_shmem_dropped)
> 		shmem_uncharge(mapping->host, nr_shmem_dropped);
> 
>-	if (!ret && is_anon && !folio_is_device_private(folio))
>+	if (!ret && is_anon && !folio_is_device_private(folio) && is_underused_thp)
> 		ttu_flags = TTU_USE_SHARED_ZEROPAGE;
> 
> 	remap_page(folio, 1 << old_order, ttu_flags);
>@@ -4309,7 +4310,7 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
> 	struct folio *folio = page_folio(page);
> 
> 	return __folio_split(folio, new_order, &folio->page, page, list,
>-			     SPLIT_TYPE_UNIFORM);
>+			     SPLIT_TYPE_UNIFORM, false);
> }
> 
> /**
>@@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
> 		struct page *split_at, struct list_head *list)
> {
> 	return __folio_split(folio, new_order, split_at, &folio->page, list,
>-			     SPLIT_TYPE_NON_UNIFORM);
>+			     SPLIT_TYPE_NON_UNIFORM, false);
>+}
>+
>+int folio_split_underused(struct folio *folio)
>+{
>+	return __folio_split(folio, 0, &folio->page, &folio->page,
>+			     NULL, SPLIT_TYPE_NON_UNIFORM, true);

IIUC, it should be SPLIT_TYPE_UNIFORM, not SPLIT_TYPE_NON_UNIFORM ...

deferred_split_scan() used split_folio(), so for the underused case it
split the whole THP uniformly down to order-0 pages. The shared zeropage
remapping happens later, via remove_migration_ptes(), after the split.

With SPLIT_TYPE_NON_UNIFORM and split_at == &folio->page, most of an
order-9 THP can stays as larger folios.

Then try_to_map_unused_to_zeropage() rejects those folios:

	if (PageCompound(page) || PageHWPoison(page))
		return false;

So the underused shrinker would no longer remap/free many zero-filled
subpages ...

> }
> 
> /**
>@@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> 		}
> 		if (!folio_trylock(folio))
> 			goto requeue;
>-		if (!split_folio(folio)) {
>+		if (!folio_split_underused(folio)) {
> 			did_split = true;
> 			if (underused)
> 				count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
>-- 
>2.54.0
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-08 21:32 ` David Hildenbrand (Arm)
@ 2026-05-09  8:25   ` Lance Yang
  2026-05-10 11:39   ` Usama Arif
  2026-05-11 18:40   ` Nico Pache
  2 siblings, 0 replies; 12+ messages in thread
From: Lance Yang @ 2026-05-09  8:25 UTC (permalink / raw)
  To: david, npache
  Cc: linux-mm, linux-kernel, yuzhao, usamaarif642, lance.yang, baohua,
	dev.jain, ryan.roberts, liam, baolin.wang, ziy, ljs, akpm


On Fri, May 08, 2026 at 11:32:09PM +0200, David Hildenbrand (Arm) wrote:
>On 5/8/26 19:05, Nico Pache wrote:
>> Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
>> when splitting isolated thp"), splitting an anonymous THP remaps all
>> zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
>> This flag is set unconditionally for every anonymous folio split,
>> including splits triggered by KSM.
>
>And even when the underused scanner is effectively disabled on a system. Hm.
>
>I don't quite like that we scan for zeropages when nobody even requested us to
>split because of zeropages.
>
>I can see why we would want to scan for zeropages in a setup where the underused
>scanner is active, even when the split was triggered by someone/something else
>(below).
>
>[...]
>
>>  /**
>> @@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
>>  		struct page *split_at, struct list_head *list)
>>  {
>>  	return __folio_split(folio, new_order, split_at, &folio->page, list,
>> -			     SPLIT_TYPE_NON_UNIFORM);
>> +			     SPLIT_TYPE_NON_UNIFORM, false);
>> +}
>> +
>> +int folio_split_underused(struct folio *folio)
>> +{
>> +	return __folio_split(folio, 0, &folio->page, &folio->page,
>> +			     NULL, SPLIT_TYPE_NON_UNIFORM, true);
>>  }
>>  
>>  /**
>> @@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>>  		}
>>  		if (!folio_trylock(folio))
>>  			goto requeue;
>> -		if (!split_folio(folio)) {
>> +		if (!folio_split_underused(folio)) {
>>  			did_split = true;
>>  			if (underused)
>>  				count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
>
>In general, this looks clean.
>
>But imagine the following: someone splits the THP for another reason: for
>example, because migration is unable to allocate a 2M THP, or because we have to
>split on swapout etc.
>
>Not freeing the zero-filled pages means that these pages cannot be reclaimed
>anymore easily. We split a possibly underused THP but didn't free the memory.
>
>The only way to free the memory would be to wait for another collapse, and then
>have the new THP be detected as underused.
>
>Hm.
>
>(1) As you say, the alternative is to let KSM say that it wants to handle the
>zero-filled pages itself. I'm not a the biggest fan of that approach. We still
>have two mechanisms interacting to some degree.
>
>(2) Another approach is to just let KSM handle this in VMAs that are marked as
>mergable while KSM is active. That is, we check for VM_MERGABLE and ksm_run ==
>KSM_RUN_MERGE in try_to_map_unused_to_zeropage() to just let KSM do its thing.
>
>That really just stops both mechanisms from interacting.
>
>(3) Yet another approach I could think of (in general) is to disable the
>underused handling in a system where the underused splitting is entirely disabled.
>
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index e9d499da0ac7..5eca99271957 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -82,6 +82,14 @@ unsigned long huge_anon_orders_madvise __read_mostly;
> unsigned long huge_anon_orders_inherit __read_mostly;
> static bool anon_orders_configured __initdata;
>
>+static bool thp_underused_split_active(void)
>+{
>+       if (!split_underused_thp)
>+               return false;
>+
>+       return khugepaged_max_ptes_none != HPAGE_PMD_NR - 1;
>+}
>+
> static inline bool file_thp_enabled(struct vm_area_struct *vma)
> {
>        struct inode *inode;
>@@ -4188,7 +4196,8 @@ static int __folio_split(struct folio *folio, unsigned int
>new_order,
>        if (nr_shmem_dropped)
>                shmem_uncharge(mapping->host, nr_shmem_dropped);
>
>-       if (!ret && is_anon && !folio_is_device_private(folio))
>+       if (!ret && is_anon && !folio_is_device_private(folio) &&
>+           thp_underused_split_active())
>                ttu_flags = TTU_USE_SHARED_ZEROPAGE;
>
>        remap_page(folio, 1 << old_order, ttu_flags);
>@@ -4497,7 +4506,7 @@ static bool thp_underused(struct folio *folio)
>        int num_zero_pages = 0, num_filled_pages = 0;
>        int i;
>
>-       if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
>+       if (!thp_underused_split_active())
>                return false;
>
>        if (folio_contain_hwpoisoned_page(folio))
>
>
>
>I tend to like (2), and maybe (3) on top. Opinions?

Cool! (2) + (3) sounds good to me ;)

For VM_MERGEABLE VMAs while KSM is running, makes sense to let KSM handle
zero-filled pages itself. Without (2), the split path may remap many
zero-filled subpages to the shared zeropage before KSM gets to them ...
With (2), those subpages remain normal anon pages for KSM to process
later, according to its own settings, such as use_zero_pages, and scan
pacing, such as pages_to_scan.

For other VMAs, keeping the opportunistic shared zeropage remap seems
useful while split_underused_thp is active. Once the THP is split, the
underused shrinker cannot find it anymore :)

And, yes, if split_underused_thp is disabled, generic THP splits should
not to do this extra scan/remap work; just leave those zero-filled pages
alone, IMHO :D

Cheers, Lance

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-08 21:32 ` David Hildenbrand (Arm)
  2026-05-09  8:25   ` Lance Yang
@ 2026-05-10 11:39   ` Usama Arif
  2026-05-11  6:36     ` David Hildenbrand (Arm)
  2026-05-11 18:40   ` Nico Pache
  2 siblings, 1 reply; 12+ messages in thread
From: Usama Arif @ 2026-05-10 11:39 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Usama Arif, Nico Pache, linux-mm, linux-kernel, yuzhao,
	usamaarif642, lance.yang, baohua, dev.jain, ryan.roberts, liam,
	baolin.wang, ziy, ljs, akpm

On Fri, 8 May 2026 23:32:09 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote:

> On 5/8/26 19:05, Nico Pache wrote:
> > Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
> > when splitting isolated thp"), splitting an anonymous THP remaps all
> > zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
> > This flag is set unconditionally for every anonymous folio split,
> > including splits triggered by KSM.
> 
> And even when the underused scanner is effectively disabled on a system. Hm.
> 
> I don't quite like that we scan for zeropages when nobody even requested us to
> split because of zeropages.
> 
> I can see why we would want to scan for zeropages in a setup where the underused
> scanner is active, even when the split was triggered by someone/something else
> (below).
> 
> [...]
> 
> >  /**
> > @@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
> >  		struct page *split_at, struct list_head *list)
> >  {
> >  	return __folio_split(folio, new_order, split_at, &folio->page, list,
> > -			     SPLIT_TYPE_NON_UNIFORM);
> > +			     SPLIT_TYPE_NON_UNIFORM, false);
> > +}
> > +
> > +int folio_split_underused(struct folio *folio)
> > +{
> > +	return __folio_split(folio, 0, &folio->page, &folio->page,
> > +			     NULL, SPLIT_TYPE_NON_UNIFORM, true);
> >  }
> >  
> >  /**
> > @@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> >  		}
> >  		if (!folio_trylock(folio))
> >  			goto requeue;
> > -		if (!split_folio(folio)) {
> > +		if (!folio_split_underused(folio)) {
> >  			did_split = true;
> >  			if (underused)
> >  				count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
> 
> In general, this looks clean.
> 
> But imagine the following: someone splits the THP for another reason: for
> example, because migration is unable to allocate a 2M THP, or because we have to
> split on swapout etc.
> 
> Not freeing the zero-filled pages means that these pages cannot be reclaimed
> anymore easily. We split a possibly underused THP but didn't free the memory.
> 
> The only way to free the memory would be to wait for another collapse, and then
> have the new THP be detected as underused.
> 
> Hm.
> 
> (1) As you say, the alternative is to let KSM say that it wants to handle the
> zero-filled pages itself. I'm not a the biggest fan of that approach. We still
> have two mechanisms interacting to some degree.
> 
> (2) Another approach is to just let KSM handle this in VMAs that are marked as
> mergable while KSM is active. That is, we check for VM_MERGABLE and ksm_run ==
> KSM_RUN_MERGE in try_to_map_unused_to_zeropage() to just let KSM do its thing.
> 
> That really just stops both mechanisms from interacting.
> 
> (3) Yet another approach I could think of (in general) is to disable the
> underused handling in a system where the underused splitting is entirely disabled.
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e9d499da0ac7..5eca99271957 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -82,6 +82,14 @@ unsigned long huge_anon_orders_madvise __read_mostly;
>  unsigned long huge_anon_orders_inherit __read_mostly;
>  static bool anon_orders_configured __initdata;
> 
> +static bool thp_underused_split_active(void)
> +{
> +       if (!split_underused_thp)
> +               return false;
> +
> +       return khugepaged_max_ptes_none != HPAGE_PMD_NR - 1;
> +}
> +
>  static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  {
>         struct inode *inode;
> @@ -4188,7 +4196,8 @@ static int __folio_split(struct folio *folio, unsigned int
> new_order,
>         if (nr_shmem_dropped)
>                 shmem_uncharge(mapping->host, nr_shmem_dropped);
> 
> -       if (!ret && is_anon && !folio_is_device_private(folio))
> +       if (!ret && is_anon && !folio_is_device_private(folio) &&
> +           thp_underused_split_active())
>                 ttu_flags = TTU_USE_SHARED_ZEROPAGE;
> 
>         remap_page(folio, 1 << old_order, ttu_flags);
> @@ -4497,7 +4506,7 @@ static bool thp_underused(struct folio *folio)
>         int num_zero_pages = 0, num_filled_pages = 0;
>         int i;
> 
> -       if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> +       if (!thp_underused_split_active())
>                 return false;
> 
>         if (folio_contain_hwpoisoned_page(folio))
> 
> 
> 
> I tend to like (2), and maybe (3) on top. Opinions?
> 

Hello!

I think (3) definitely makes sense.

I have not had a deep look at KSM up until just now, so might be dumb
to say all of below.. :)

What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
THP whenever it actually wants to merge a single 4K chunk. That seems like a
lot of work for a single 4K?

One thing that came to my mind is to have a separate tree for THPs and only
merge the THPs that have the same content, but the possibility of encoutering
2M pages with same content is extremely low? so this is probably a bad idea.

An alternative is, does it even make sense to process and split THPs by KSM
in the way it works now? IMO this is a lot of work for a single 4K merge.
Shrinker is designed to release memory when its needed, i.e. reclaim, at
which point IMO free memory is more important than performance. But KSM runs
all the time.. so constantly splitting THPs everytime a single 4K can be
merged just hurts performance all the time. If someone cares about memory,
they should be running the shrinker. Is a better alternative that KSM skips
THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
only then KSM gets those 4K subpages?

Above sounds like reworking KSM, but just wanted to put it out there.

(2) + (3) sounds like a good solution, but I wonder if above alternative
of KSM just skipping THPs might be better?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-10 11:39   ` Usama Arif
@ 2026-05-11  6:36     ` David Hildenbrand (Arm)
  2026-05-11 13:10       ` Usama Arif
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11  6:36 UTC (permalink / raw)
  To: Usama Arif
  Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
	lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
	ziy, ljs, akpm

>>
>> I tend to like (2), and maybe (3) on top. Opinions?
>>
> 
> Hello!

Hi!

> 
> I think (3) definitely makes sense.
> 
> I have not had a deep look at KSM up until just now, so might be dumb
> to say all of below.. :)
> 
> What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
> THP whenever it actually wants to merge a single 4K chunk. That seems like a
> lot of work for a single 4K?

Yes, but that's what the users ask for: if there is a chance to deduplicate
memory, it shall be deduplicated asap.

> 
> One thing that came to my mind is to have a separate tree for THPs and only
> merge the THPs that have the same content, but the possibility of encoutering
> 2M pages with same content is extremely low? so this is probably a bad idea.

Right, the probability is low, and it would change existing semantics, breaking
existing users.

In addition, we would have to add large folio support for KSM, which I rather
would avoid.

> 
> An alternative is, does it even make sense to process and split THPs by KSM
> in the way it works now? IMO this is a lot of work for a single 4K merge.
> Shrinker is designed to release memory when its needed, i.e. reclaim, at
> which point IMO free memory is more important than performance. But KSM runs
> all the time.. so constantly splitting THPs everytime a single 4K can be
> merged just hurts performance all the time.

Right, but that's what you get with KSM: bad performance if there is a chance to
deduplicate :)

(and bad performance from scanning overhead)

> If someone cares about memory,
> they should be running the shrinker.

It's not just the zero page, but really any page content. The zero page is
currently only "special" after we added conditional support to deduplicate to
the shared zeropage in KSM. The shrinker doesn't help for any other page content
besides zero-filled.

Further, the shrinker is something system-wide, whereby KSM is usually only
enabled for selected VMAs (with some exceptions nowadays).

Also note that KSM deduplicates independent of the folio size: not just THPs,
but really any (large) folio. Yes, it splits large folios, but that's really
just to keep the T in THP.

> Is a better alternative that KSM skips
> THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
> only then KSM gets those 4K subpages?
> 
> Above sounds like reworking KSM, but just wanted to put it out there.

Right, and it makes KSM more THP aware. Which is something I would avoid right now.

> 
> (2) + (3) sounds like a good solution, but I wonder if above alternative
> of KSM just skipping THPs might be better?

That would change the semantics where, for example, where we expect that memory
was deduplicated after a KSM run.

VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
except where we can deduplicate memory. Skipping THPs would essentially break
the main use case for KSM :)

Does that make sense?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-11  6:36     ` David Hildenbrand (Arm)
@ 2026-05-11 13:10       ` Usama Arif
  2026-05-11 13:42         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 12+ messages in thread
From: Usama Arif @ 2026-05-11 13:10 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
	lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
	ziy, ljs, akpm



On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
> 
>>>
>>> I tend to like (2), and maybe (3) on top. Opinions?
>>>
>>
>> Hello!
> 
> 
> Hi!
> 
>>
>> I think (3) definitely makes sense.
>>
>> I have not had a deep look at KSM up until just now, so might be dumb
>> to say all of below.. :)
>>
>> What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
>> THP whenever it actually wants to merge a single 4K chunk. That seems like a
>> lot of work for a single 4K?
> 
> Yes, but that's what the users ask for: if there is a chance to deduplicate
> memory, it shall be deduplicated asap.
> 
>>
>> One thing that came to my mind is to have a separate tree for THPs and only
>> merge the THPs that have the same content, but the possibility of encoutering
>> 2M pages with same content is extremely low? so this is probably a bad idea.
> 
> Right, the probability is low, and it would change existing semantics, breaking
> existing users.
> 
> In addition, we would have to add large folio support for KSM, which I rather
> would avoid.
> 
>>
>> An alternative is, does it even make sense to process and split THPs by KSM
>> in the way it works now? IMO this is a lot of work for a single 4K merge.
>> Shrinker is designed to release memory when its needed, i.e. reclaim, at
>> which point IMO free memory is more important than performance. But KSM runs
>> all the time.. so constantly splitting THPs everytime a single 4K can be
>> merged just hurts performance all the time.
> 
> Right, but that's what you get with KSM: bad performance if there is a chance to
> deduplicate :)
> 
> (and bad performance from scanning overhead)
> 
>> If someone cares about memory,
>> they should be running the shrinker.
> 
> It's not just the zero page, but really any page content. The zero page is
> currently only "special" after we added conditional support to deduplicate to
> the shared zeropage in KSM. The shrinker doesn't help for any other page content
> besides zero-filled.
> 
> Further, the shrinker is something system-wide, whereby KSM is usually only
> enabled for selected VMAs (with some exceptions nowadays).
> 
> Also note that KSM deduplicates independent of the folio size: not just THPs,
> but really any (large) folio. Yes, it splits large folios, but that's really
> just to keep the T in THP.
> 
>> Is a better alternative that KSM skips
>> THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
>> only then KSM gets those 4K subpages?
>>
>> Above sounds like reworking KSM, but just wanted to put it out there.
> 
> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
> 
>>
>> (2) + (3) sounds like a good solution, but I wonder if above alternative
>> of KSM just skipping THPs might be better?
> 
> That would change the semantics where, for example, where we expect that memory
> was deduplicated after a KSM run.
> 
> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
> except where we can deduplicate memory. Skipping THPs would essentially break
> the main use case for KSM :)
> 
> Does that make sense?
> 

Yes, all of above makes sense. But I feel like this means someone should not
set THP policy to always and enable KSM together. In general I feel like KSM
is not something that should be run on big servers, as hopefully you are
not managing memory as 4K chunks for big machines and using a lot of THPs.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-11 13:10       ` Usama Arif
@ 2026-05-11 13:42         ` David Hildenbrand (Arm)
  2026-05-11 13:44           ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11 13:42 UTC (permalink / raw)
  To: Usama Arif
  Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
	lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
	ziy, ljs, akpm

On 5/11/26 15:10, Usama Arif wrote:
> 
> 
> On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>>
>>>
>>> Hello!
>>
>>
>> Hi!
>>
>>>
>>> I think (3) definitely makes sense.
>>>
>>> I have not had a deep look at KSM up until just now, so might be dumb
>>> to say all of below.. :)
>>>
>>> What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
>>> THP whenever it actually wants to merge a single 4K chunk. That seems like a
>>> lot of work for a single 4K?
>>
>> Yes, but that's what the users ask for: if there is a chance to deduplicate
>> memory, it shall be deduplicated asap.
>>
>>>
>>> One thing that came to my mind is to have a separate tree for THPs and only
>>> merge the THPs that have the same content, but the possibility of encoutering
>>> 2M pages with same content is extremely low? so this is probably a bad idea.
>>
>> Right, the probability is low, and it would change existing semantics, breaking
>> existing users.
>>
>> In addition, we would have to add large folio support for KSM, which I rather
>> would avoid.
>>
>>>
>>> An alternative is, does it even make sense to process and split THPs by KSM
>>> in the way it works now? IMO this is a lot of work for a single 4K merge.
>>> Shrinker is designed to release memory when its needed, i.e. reclaim, at
>>> which point IMO free memory is more important than performance. But KSM runs
>>> all the time.. so constantly splitting THPs everytime a single 4K can be
>>> merged just hurts performance all the time.
>>
>> Right, but that's what you get with KSM: bad performance if there is a chance to
>> deduplicate :)
>>
>> (and bad performance from scanning overhead)
>>
>>> If someone cares about memory,
>>> they should be running the shrinker.
>>
>> It's not just the zero page, but really any page content. The zero page is
>> currently only "special" after we added conditional support to deduplicate to
>> the shared zeropage in KSM. The shrinker doesn't help for any other page content
>> besides zero-filled.
>>
>> Further, the shrinker is something system-wide, whereby KSM is usually only
>> enabled for selected VMAs (with some exceptions nowadays).
>>
>> Also note that KSM deduplicates independent of the folio size: not just THPs,
>> but really any (large) folio. Yes, it splits large folios, but that's really
>> just to keep the T in THP.
>>
>>> Is a better alternative that KSM skips
>>> THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
>>> only then KSM gets those 4K subpages?
>>>
>>> Above sounds like reworking KSM, but just wanted to put it out there.
>>
>> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>>
>>>
>>> (2) + (3) sounds like a good solution, but I wonder if above alternative
>>> of KSM just skipping THPs might be better?
>>
>> That would change the semantics where, for example, where we expect that memory
>> was deduplicated after a KSM run.
>>
>> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
>> except where we can deduplicate memory. Skipping THPs would essentially break
>> the main use case for KSM :)
>>
>> Does that make sense?
>>
> 
> Yes, all of above makes sense. But I feel like this means someone should not
> set THP policy to always and enable KSM together. 

IIRC, QEMU will, as default, set MADV_HUGEPAGE and MADV_MERGEABLE :)

(KSM itself later has to be enabled manually on a system level)

> In general I feel like KSM
> is not something that should be run on big servers, as hopefully you are
> not managing memory as 4K chunks for big machines and using a lot of THPs.

Right. But the 4k chunks are movable and compaction can move them around to
create THPs elsewhere.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-11 13:42         ` David Hildenbrand (Arm)
@ 2026-05-11 13:44           ` David Hildenbrand (Arm)
  2026-05-11 14:15             ` Usama Arif
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11 13:44 UTC (permalink / raw)
  To: Usama Arif
  Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
	lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
	ziy, ljs, akpm

On 5/11/26 15:42, David Hildenbrand (Arm) wrote:
> On 5/11/26 15:10, Usama Arif wrote:
>>
>>
>> On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>>>
>>>
>>>
>>> Hi!
>>>
>>>
>>> Yes, but that's what the users ask for: if there is a chance to deduplicate
>>> memory, it shall be deduplicated asap.
>>>
>>>
>>> Right, the probability is low, and it would change existing semantics, breaking
>>> existing users.
>>>
>>> In addition, we would have to add large folio support for KSM, which I rather
>>> would avoid.
>>>
>>>
>>> Right, but that's what you get with KSM: bad performance if there is a chance to
>>> deduplicate :)
>>>
>>> (and bad performance from scanning overhead)
>>>
>>>
>>> It's not just the zero page, but really any page content. The zero page is
>>> currently only "special" after we added conditional support to deduplicate to
>>> the shared zeropage in KSM. The shrinker doesn't help for any other page content
>>> besides zero-filled.
>>>
>>> Further, the shrinker is something system-wide, whereby KSM is usually only
>>> enabled for selected VMAs (with some exceptions nowadays).
>>>
>>> Also note that KSM deduplicates independent of the folio size: not just THPs,
>>> but really any (large) folio. Yes, it splits large folios, but that's really
>>> just to keep the T in THP.
>>>
>>>
>>> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>>>
>>>
>>> That would change the semantics where, for example, where we expect that memory
>>> was deduplicated after a KSM run.
>>>
>>> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
>>> except where we can deduplicate memory. Skipping THPs would essentially break
>>> the main use case for KSM :)
>>>
>>> Does that make sense?
>>>
>>
>> Yes, all of above makes sense. But I feel like this means someone should not
>> set THP policy to always and enable KSM together. 
> 
> IIRC, QEMU will, as default, set MADV_HUGEPAGE and MADV_MERGEABLE :)
> 
> (KSM itself later has to be enabled manually on a system level)

Digging around, RHEL documents that one might want to consider disabling THPs
for performance:

"As KSM can reduce the occurence of transparent huge pages, you may want to
disable it before enabling THP." [1]

But that doesn't mean that some people are using that combination.

In the end "some THPs" is better than "no THPs" :)

[1]
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-memory-huge_pages

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-11 13:44           ` David Hildenbrand (Arm)
@ 2026-05-11 14:15             ` Usama Arif
  0 siblings, 0 replies; 12+ messages in thread
From: Usama Arif @ 2026-05-11 14:15 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
	lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
	ziy, ljs, akpm



On 11/05/2026 14:44, David Hildenbrand (Arm) wrote:
> On 5/11/26 15:42, David Hildenbrand (Arm) wrote:
>> On 5/11/26 15:10, Usama Arif wrote:
>>>
>>>
>>> On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>>>>
>>>>
>>>>
>>>> Hi!
>>>>
>>>>
>>>> Yes, but that's what the users ask for: if there is a chance to deduplicate
>>>> memory, it shall be deduplicated asap.
>>>>
>>>>
>>>> Right, the probability is low, and it would change existing semantics, breaking
>>>> existing users.
>>>>
>>>> In addition, we would have to add large folio support for KSM, which I rather
>>>> would avoid.
>>>>
>>>>
>>>> Right, but that's what you get with KSM: bad performance if there is a chance to
>>>> deduplicate :)
>>>>
>>>> (and bad performance from scanning overhead)
>>>>
>>>>
>>>> It's not just the zero page, but really any page content. The zero page is
>>>> currently only "special" after we added conditional support to deduplicate to
>>>> the shared zeropage in KSM. The shrinker doesn't help for any other page content
>>>> besides zero-filled.
>>>>
>>>> Further, the shrinker is something system-wide, whereby KSM is usually only
>>>> enabled for selected VMAs (with some exceptions nowadays).
>>>>
>>>> Also note that KSM deduplicates independent of the folio size: not just THPs,
>>>> but really any (large) folio. Yes, it splits large folios, but that's really
>>>> just to keep the T in THP.
>>>>
>>>>
>>>> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>>>>
>>>>
>>>> That would change the semantics where, for example, where we expect that memory
>>>> was deduplicated after a KSM run.
>>>>
>>>> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
>>>> except where we can deduplicate memory. Skipping THPs would essentially break
>>>> the main use case for KSM :)
>>>>
>>>> Does that make sense?
>>>>
>>>
>>> Yes, all of above makes sense. But I feel like this means someone should not
>>> set THP policy to always and enable KSM together. 
>>
>> IIRC, QEMU will, as default, set MADV_HUGEPAGE and MADV_MERGEABLE :)

That is interesting... :)

>>
>> (KSM itself later has to be enabled manually on a system level)
> 
> Digging around, RHEL documents that one might want to consider disabling THPs
> for performance:
> 
> "As KSM can reduce the occurence of transparent huge pages, you may want to
> disable it before enabling THP." [1]
> 
> But that doesn't mean that some people are using that combination.
> 
> In the end "some THPs" is better than "no THPs" :)
> 
> [1]
> https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-memory-huge_pages
> 

Thanks for sharing this and the link, its very useful.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-08 21:32 ` David Hildenbrand (Arm)
  2026-05-09  8:25   ` Lance Yang
  2026-05-10 11:39   ` Usama Arif
@ 2026-05-11 18:40   ` Nico Pache
  2 siblings, 0 replies; 12+ messages in thread
From: Nico Pache @ 2026-05-11 18:40 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-mm, linux-kernel, yuzhao, usamaarif642, lance.yang, baohua,
	dev.jain, ryan.roberts, liam, baolin.wang, ziy, ljs, akpm

On Fri, May 8, 2026 at 3:32 PM David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> On 5/8/26 19:05, Nico Pache wrote:
> > Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
> > when splitting isolated thp"), splitting an anonymous THP remaps all
> > zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
> > This flag is set unconditionally for every anonymous folio split,
> > including splits triggered by KSM.
>
> And even when the underused scanner is effectively disabled on a system. Hm.
>
> I don't quite like that we scan for zeropages when nobody even requested us to
> split because of zeropages.
>
> I can see why we would want to scan for zeropages in a setup where the underused
> scanner is active, even when the split was triggered by someone/something else
> (below).
>
> [...]
>
> >  /**
> > @@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
> >               struct page *split_at, struct list_head *list)
> >  {
> >       return __folio_split(folio, new_order, split_at, &folio->page, list,
> > -                          SPLIT_TYPE_NON_UNIFORM);
> > +                          SPLIT_TYPE_NON_UNIFORM, false);
> > +}
> > +
> > +int folio_split_underused(struct folio *folio)
> > +{
> > +     return __folio_split(folio, 0, &folio->page, &folio->page,
> > +                          NULL, SPLIT_TYPE_NON_UNIFORM, true);
> >  }
> >
> >  /**
> > @@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> >               }
> >               if (!folio_trylock(folio))
> >                       goto requeue;
> > -             if (!split_folio(folio)) {
> > +             if (!folio_split_underused(folio)) {
> >                       did_split = true;
> >                       if (underused)
> >                               count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
>
> In general, this looks clean.
>
> But imagine the following: someone splits the THP for another reason: for
> example, because migration is unable to allocate a 2M THP, or because we have to
> split on swapout etc.
>
> Not freeing the zero-filled pages means that these pages cannot be reclaimed
> anymore easily. We split a possibly underused THP but didn't free the memory.
>
> The only way to free the memory would be to wait for another collapse, and then
> have the new THP be detected as underused.
>
> Hm.

And what was the expected behavior before this commit? Did we just
deal with the wasted memory?

>
> (1) As you say, the alternative is to let KSM say that it wants to handle the
> zero-filled pages itself. I'm not a the biggest fan of that approach. We still
> have two mechanisms interacting to some degree.
>
> (2) Another approach is to just let KSM handle this in VMAs that are marked as
> mergable while KSM is active. That is, we check for VM_MERGABLE and ksm_run ==
> KSM_RUN_MERGE in try_to_map_unused_to_zeropage() to just let KSM do its thing.
>
> That really just stops both mechanisms from interacting.
>
> (3) Yet another approach I could think of (in general) is to disable the
> underused handling in a system where the underused splitting is entirely disabled.
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e9d499da0ac7..5eca99271957 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -82,6 +82,14 @@ unsigned long huge_anon_orders_madvise __read_mostly;
>  unsigned long huge_anon_orders_inherit __read_mostly;
>  static bool anon_orders_configured __initdata;
>
> +static bool thp_underused_split_active(void)
> +{
> +       if (!split_underused_thp)
> +               return false;
> +
> +       return khugepaged_max_ptes_none != HPAGE_PMD_NR - 1;
> +}
> +
>  static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  {
>         struct inode *inode;
> @@ -4188,7 +4196,8 @@ static int __folio_split(struct folio *folio, unsigned int
> new_order,
>         if (nr_shmem_dropped)
>                 shmem_uncharge(mapping->host, nr_shmem_dropped);
>
> -       if (!ret && is_anon && !folio_is_device_private(folio))
> +       if (!ret && is_anon && !folio_is_device_private(folio) &&
> +           thp_underused_split_active())
>                 ttu_flags = TTU_USE_SHARED_ZEROPAGE;
>
>         remap_page(folio, 1 << old_order, ttu_flags);
> @@ -4497,7 +4506,7 @@ static bool thp_underused(struct folio *folio)
>         int num_zero_pages = 0, num_filled_pages = 0;
>         int i;
>
> -       if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> +       if (!thp_underused_split_active())
>                 return false;
>
>         if (folio_contain_hwpoisoned_page(folio))
>
>
>
> I tend to like (2), and maybe (3) on top. Opinions?

I don't fully understand (2) but I definitely agree with (3).

Isn't (2) similar to my split_huge_page_no_zeropage() solution in that
it only disables the behavior for KSM? but instead handled much
further down in the call path? The "fix" commit sold this as a
solution ONLY for the underutilized shrinker, but it is not that.

Either way I can test that solution. It seems clean when I gave it a
second pass; it just seems to be hiding behavior deeper in the code,
but at the proper location (the function that checks whether this
action should occur).

Cheers,
-- Nico
>
>
> --
> Cheers,
>
> David
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
  2026-05-09  3:21 ` Lance Yang
@ 2026-05-11 18:42   ` Nico Pache
  0 siblings, 0 replies; 12+ messages in thread
From: Nico Pache @ 2026-05-11 18:42 UTC (permalink / raw)
  To: Lance Yang
  Cc: linux-mm, linux-kernel, yuzhao, usamaarif642, baohua, dev.jain,
	ryan.roberts, liam, baolin.wang, ziy, ljs, david, akpm

On Fri, May 8, 2026 at 9:22 PM Lance Yang <lance.yang@linux.dev> wrote:
>
>
> On Fri, May 08, 2026 at 11:05:09AM -0600, Nico Pache wrote:
> >Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
> >when splitting isolated thp"), splitting an anonymous THP remaps all
> >zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
> >This flag is set unconditionally for every anonymous folio split,
> >including splits triggered by KSM.
> >
> >When KSM is enabled with THP=always, this causes two regressions:
> >
> >1. use_zero_pages=1: KSM calls try_to_merge_one_page() which triggers
> >   split_huge_page(). The split remaps all 512 zero-filled subpages to
> >   the shared zeropage at once, freeing the entire 2MB THP when KSM only
> >   intended to process a single 4KB page. This bypasses KSM's
> >   pages_to_scan rate limiting, causing ~1GB to be freed almost
> >   instantly.
> >
> >2. use_zero_pages=0: The same split side-effect occurs through the
> >   stable/unstable tree merge paths. Each pages_to_scan iteration
> >   triggers an expensive split_huge_page() that silently frees 2MB,
> >   while the scanner wastes cycles on tree searches for zero-filled
> >   pages that were already freed as a side-effect.
> >
> >Fix this by restricting TTU_USE_SHARED_ZEROPAGE to only the deferred
> >split shrinker path (deferred_split_scan), which is the only caller that
> >intentionally splits underused THPs to reclaim zero-filled subpages.
> >Introduce folio_split_underused() as a dedicated entry point that
> >passes is_underused_thp=true through __folio_split(), and use it from
> >deferred_split_scan(). All other split callers (KSM, compaction, etc.)
> >no longer get the zero-page remapping side-effect.
> >
> >Reviewers notes: this patch is one of two potential approaches. This patch
> >turns off the zero-page freeing that has been done since the noted commit,
> >in all the other callers, only leaving the underused shrinker to do such
> >behavior. We can also take the opposite approach of with something like
> >split_huge_page_no_zeropage() and call this within KSM.
> >
> >Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
> >Signed-off-by: Nico Pache <npache@redhat.com>
> >---
> > include/linux/huge_mm.h |  2 +-
> > mm/huge_memory.c        | 17 ++++++++++++-----
> > 2 files changed, 13 insertions(+), 6 deletions(-)
> >
> >diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >index 2949e5acff35..4ae1b52d7411 100644
> >--- a/include/linux/huge_mm.h
> >+++ b/include/linux/huge_mm.h
> >@@ -378,7 +378,7 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
> >                          enum split_type split_type);
> > int folio_split(struct folio *folio, unsigned int new_order, struct page *page,
> >               struct list_head *list);
> >-
> >+int folio_split_underused(struct folio *folio);
> > static inline int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >               unsigned int new_order)
> > {
> >diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >index 970e077019b7..91f7fad72c8a 100644
> >--- a/mm/huge_memory.c
> >+++ b/mm/huge_memory.c
> >@@ -4045,7 +4045,8 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
> >  */
> > static int __folio_split(struct folio *folio, unsigned int new_order,
> >               struct page *split_at, struct page *lock_at,
> >-              struct list_head *list, enum split_type split_type)
> >+              struct list_head *list, enum split_type split_type,
> >+              bool is_underused_thp)
> > {
> >       XA_STATE(xas, &folio->mapping->i_pages, folio->index);
> >       struct folio *end_folio = folio_next(folio);
> >@@ -4174,7 +4175,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> >       if (nr_shmem_dropped)
> >               shmem_uncharge(mapping->host, nr_shmem_dropped);
> >
> >-      if (!ret && is_anon && !folio_is_device_private(folio))
> >+      if (!ret && is_anon && !folio_is_device_private(folio) && is_underused_thp)
> >               ttu_flags = TTU_USE_SHARED_ZEROPAGE;
> >
> >       remap_page(folio, 1 << old_order, ttu_flags);
> >@@ -4309,7 +4310,7 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
> >       struct folio *folio = page_folio(page);
> >
> >       return __folio_split(folio, new_order, &folio->page, page, list,
> >-                           SPLIT_TYPE_UNIFORM);
> >+                           SPLIT_TYPE_UNIFORM, false);
> > }
> >
> > /**
> >@@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
> >               struct page *split_at, struct list_head *list)
> > {
> >       return __folio_split(folio, new_order, split_at, &folio->page, list,
> >-                           SPLIT_TYPE_NON_UNIFORM);
> >+                           SPLIT_TYPE_NON_UNIFORM, false);
> >+}
> >+
> >+int folio_split_underused(struct folio *folio)
> >+{
> >+      return __folio_split(folio, 0, &folio->page, &folio->page,
> >+                           NULL, SPLIT_TYPE_NON_UNIFORM, true);
>
> IIUC, it should be SPLIT_TYPE_UNIFORM, not SPLIT_TYPE_NON_UNIFORM ...

Thats what i had originally then convinced myself otherwise. Ill
reverify before submitting again. Unless we just end up on a different
solution like David suggested.

Thanks!
-- Nico

>
> deferred_split_scan() used split_folio(), so for the underused case it
> split the whole THP uniformly down to order-0 pages. The shared zeropage
> remapping happens later, via remove_migration_ptes(), after the split.
>
> With SPLIT_TYPE_NON_UNIFORM and split_at == &folio->page, most of an
> order-9 THP can stays as larger folios.
>
> Then try_to_map_unused_to_zeropage() rejects those folios:
>
>         if (PageCompound(page) || PageHWPoison(page))
>                 return false;
>
> So the underused shrinker would no longer remap/free many zero-filled
> subpages ...
>
> > }
> >
> > /**
> >@@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> >               }
> >               if (!folio_trylock(folio))
> >                       goto requeue;
> >-              if (!split_folio(folio)) {
> >+              if (!folio_split_underused(folio)) {
> >                       did_split = true;
> >                       if (underused)
> >                               count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
> >--
> >2.54.0
> >
> >
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-11 18:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 17:05 [RFC] mm: restrict zero-page remapping to underused THP splits Nico Pache
2026-05-08 21:32 ` David Hildenbrand (Arm)
2026-05-09  8:25   ` Lance Yang
2026-05-10 11:39   ` Usama Arif
2026-05-11  6:36     ` David Hildenbrand (Arm)
2026-05-11 13:10       ` Usama Arif
2026-05-11 13:42         ` David Hildenbrand (Arm)
2026-05-11 13:44           ` David Hildenbrand (Arm)
2026-05-11 14:15             ` Usama Arif
2026-05-11 18:40   ` Nico Pache
2026-05-09  3:21 ` Lance Yang
2026-05-11 18:42   ` Nico Pache

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox