* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
2026-05-08 21:32 ` David Hildenbrand (Arm)
@ 2026-05-09 8:25 ` Lance Yang
2026-05-10 11:39 ` Usama Arif
2026-05-11 18:40 ` Nico Pache
2 siblings, 0 replies; 12+ messages in thread
From: Lance Yang @ 2026-05-09 8:25 UTC (permalink / raw)
To: david, npache
Cc: linux-mm, linux-kernel, yuzhao, usamaarif642, lance.yang, baohua,
dev.jain, ryan.roberts, liam, baolin.wang, ziy, ljs, akpm
On Fri, May 08, 2026 at 11:32:09PM +0200, David Hildenbrand (Arm) wrote:
>On 5/8/26 19:05, Nico Pache wrote:
>> Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
>> when splitting isolated thp"), splitting an anonymous THP remaps all
>> zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
>> This flag is set unconditionally for every anonymous folio split,
>> including splits triggered by KSM.
>
>And even when the underused scanner is effectively disabled on a system. Hm.
>
>I don't quite like that we scan for zeropages when nobody even requested us to
>split because of zeropages.
>
>I can see why we would want to scan for zeropages in a setup where the underused
>scanner is active, even when the split was triggered by someone/something else
>(below).
>
>[...]
>
>> /**
>> @@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
>> struct page *split_at, struct list_head *list)
>> {
>> return __folio_split(folio, new_order, split_at, &folio->page, list,
>> - SPLIT_TYPE_NON_UNIFORM);
>> + SPLIT_TYPE_NON_UNIFORM, false);
>> +}
>> +
>> +int folio_split_underused(struct folio *folio)
>> +{
>> + return __folio_split(folio, 0, &folio->page, &folio->page,
>> + NULL, SPLIT_TYPE_NON_UNIFORM, true);
>> }
>>
>> /**
>> @@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>> }
>> if (!folio_trylock(folio))
>> goto requeue;
>> - if (!split_folio(folio)) {
>> + if (!folio_split_underused(folio)) {
>> did_split = true;
>> if (underused)
>> count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
>
>In general, this looks clean.
>
>But imagine the following: someone splits the THP for another reason: for
>example, because migration is unable to allocate a 2M THP, or because we have to
>split on swapout etc.
>
>Not freeing the zero-filled pages means that these pages cannot be reclaimed
>anymore easily. We split a possibly underused THP but didn't free the memory.
>
>The only way to free the memory would be to wait for another collapse, and then
>have the new THP be detected as underused.
>
>Hm.
>
>(1) As you say, the alternative is to let KSM say that it wants to handle the
>zero-filled pages itself. I'm not a the biggest fan of that approach. We still
>have two mechanisms interacting to some degree.
>
>(2) Another approach is to just let KSM handle this in VMAs that are marked as
>mergable while KSM is active. That is, we check for VM_MERGABLE and ksm_run ==
>KSM_RUN_MERGE in try_to_map_unused_to_zeropage() to just let KSM do its thing.
>
>That really just stops both mechanisms from interacting.
>
>(3) Yet another approach I could think of (in general) is to disable the
>underused handling in a system where the underused splitting is entirely disabled.
>
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index e9d499da0ac7..5eca99271957 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -82,6 +82,14 @@ unsigned long huge_anon_orders_madvise __read_mostly;
> unsigned long huge_anon_orders_inherit __read_mostly;
> static bool anon_orders_configured __initdata;
>
>+static bool thp_underused_split_active(void)
>+{
>+ if (!split_underused_thp)
>+ return false;
>+
>+ return khugepaged_max_ptes_none != HPAGE_PMD_NR - 1;
>+}
>+
> static inline bool file_thp_enabled(struct vm_area_struct *vma)
> {
> struct inode *inode;
>@@ -4188,7 +4196,8 @@ static int __folio_split(struct folio *folio, unsigned int
>new_order,
> if (nr_shmem_dropped)
> shmem_uncharge(mapping->host, nr_shmem_dropped);
>
>- if (!ret && is_anon && !folio_is_device_private(folio))
>+ if (!ret && is_anon && !folio_is_device_private(folio) &&
>+ thp_underused_split_active())
> ttu_flags = TTU_USE_SHARED_ZEROPAGE;
>
> remap_page(folio, 1 << old_order, ttu_flags);
>@@ -4497,7 +4506,7 @@ static bool thp_underused(struct folio *folio)
> int num_zero_pages = 0, num_filled_pages = 0;
> int i;
>
>- if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
>+ if (!thp_underused_split_active())
> return false;
>
> if (folio_contain_hwpoisoned_page(folio))
>
>
>
>I tend to like (2), and maybe (3) on top. Opinions?
Cool! (2) + (3) sounds good to me ;)
For VM_MERGEABLE VMAs while KSM is running, makes sense to let KSM handle
zero-filled pages itself. Without (2), the split path may remap many
zero-filled subpages to the shared zeropage before KSM gets to them ...
With (2), those subpages remain normal anon pages for KSM to process
later, according to its own settings, such as use_zero_pages, and scan
pacing, such as pages_to_scan.
For other VMAs, keeping the opportunistic shared zeropage remap seems
useful while split_underused_thp is active. Once the THP is split, the
underused shrinker cannot find it anymore :)
And, yes, if split_underused_thp is disabled, generic THP splits should
not to do this extra scan/remap work; just leave those zero-filled pages
alone, IMHO :D
Cheers, Lance
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
2026-05-08 21:32 ` David Hildenbrand (Arm)
2026-05-09 8:25 ` Lance Yang
@ 2026-05-10 11:39 ` Usama Arif
2026-05-11 6:36 ` David Hildenbrand (Arm)
2026-05-11 18:40 ` Nico Pache
2 siblings, 1 reply; 12+ messages in thread
From: Usama Arif @ 2026-05-10 11:39 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Usama Arif, Nico Pache, linux-mm, linux-kernel, yuzhao,
usamaarif642, lance.yang, baohua, dev.jain, ryan.roberts, liam,
baolin.wang, ziy, ljs, akpm
On Fri, 8 May 2026 23:32:09 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote:
> On 5/8/26 19:05, Nico Pache wrote:
> > Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
> > when splitting isolated thp"), splitting an anonymous THP remaps all
> > zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
> > This flag is set unconditionally for every anonymous folio split,
> > including splits triggered by KSM.
>
> And even when the underused scanner is effectively disabled on a system. Hm.
>
> I don't quite like that we scan for zeropages when nobody even requested us to
> split because of zeropages.
>
> I can see why we would want to scan for zeropages in a setup where the underused
> scanner is active, even when the split was triggered by someone/something else
> (below).
>
> [...]
>
> > /**
> > @@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
> > struct page *split_at, struct list_head *list)
> > {
> > return __folio_split(folio, new_order, split_at, &folio->page, list,
> > - SPLIT_TYPE_NON_UNIFORM);
> > + SPLIT_TYPE_NON_UNIFORM, false);
> > +}
> > +
> > +int folio_split_underused(struct folio *folio)
> > +{
> > + return __folio_split(folio, 0, &folio->page, &folio->page,
> > + NULL, SPLIT_TYPE_NON_UNIFORM, true);
> > }
> >
> > /**
> > @@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> > }
> > if (!folio_trylock(folio))
> > goto requeue;
> > - if (!split_folio(folio)) {
> > + if (!folio_split_underused(folio)) {
> > did_split = true;
> > if (underused)
> > count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
>
> In general, this looks clean.
>
> But imagine the following: someone splits the THP for another reason: for
> example, because migration is unable to allocate a 2M THP, or because we have to
> split on swapout etc.
>
> Not freeing the zero-filled pages means that these pages cannot be reclaimed
> anymore easily. We split a possibly underused THP but didn't free the memory.
>
> The only way to free the memory would be to wait for another collapse, and then
> have the new THP be detected as underused.
>
> Hm.
>
> (1) As you say, the alternative is to let KSM say that it wants to handle the
> zero-filled pages itself. I'm not a the biggest fan of that approach. We still
> have two mechanisms interacting to some degree.
>
> (2) Another approach is to just let KSM handle this in VMAs that are marked as
> mergable while KSM is active. That is, we check for VM_MERGABLE and ksm_run ==
> KSM_RUN_MERGE in try_to_map_unused_to_zeropage() to just let KSM do its thing.
>
> That really just stops both mechanisms from interacting.
>
> (3) Yet another approach I could think of (in general) is to disable the
> underused handling in a system where the underused splitting is entirely disabled.
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e9d499da0ac7..5eca99271957 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -82,6 +82,14 @@ unsigned long huge_anon_orders_madvise __read_mostly;
> unsigned long huge_anon_orders_inherit __read_mostly;
> static bool anon_orders_configured __initdata;
>
> +static bool thp_underused_split_active(void)
> +{
> + if (!split_underused_thp)
> + return false;
> +
> + return khugepaged_max_ptes_none != HPAGE_PMD_NR - 1;
> +}
> +
> static inline bool file_thp_enabled(struct vm_area_struct *vma)
> {
> struct inode *inode;
> @@ -4188,7 +4196,8 @@ static int __folio_split(struct folio *folio, unsigned int
> new_order,
> if (nr_shmem_dropped)
> shmem_uncharge(mapping->host, nr_shmem_dropped);
>
> - if (!ret && is_anon && !folio_is_device_private(folio))
> + if (!ret && is_anon && !folio_is_device_private(folio) &&
> + thp_underused_split_active())
> ttu_flags = TTU_USE_SHARED_ZEROPAGE;
>
> remap_page(folio, 1 << old_order, ttu_flags);
> @@ -4497,7 +4506,7 @@ static bool thp_underused(struct folio *folio)
> int num_zero_pages = 0, num_filled_pages = 0;
> int i;
>
> - if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> + if (!thp_underused_split_active())
> return false;
>
> if (folio_contain_hwpoisoned_page(folio))
>
>
>
> I tend to like (2), and maybe (3) on top. Opinions?
>
Hello!
I think (3) definitely makes sense.
I have not had a deep look at KSM up until just now, so might be dumb
to say all of below.. :)
What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
THP whenever it actually wants to merge a single 4K chunk. That seems like a
lot of work for a single 4K?
One thing that came to my mind is to have a separate tree for THPs and only
merge the THPs that have the same content, but the possibility of encoutering
2M pages with same content is extremely low? so this is probably a bad idea.
An alternative is, does it even make sense to process and split THPs by KSM
in the way it works now? IMO this is a lot of work for a single 4K merge.
Shrinker is designed to release memory when its needed, i.e. reclaim, at
which point IMO free memory is more important than performance. But KSM runs
all the time.. so constantly splitting THPs everytime a single 4K can be
merged just hurts performance all the time. If someone cares about memory,
they should be running the shrinker. Is a better alternative that KSM skips
THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
only then KSM gets those 4K subpages?
Above sounds like reworking KSM, but just wanted to put it out there.
(2) + (3) sounds like a good solution, but I wonder if above alternative
of KSM just skipping THPs might be better?
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
2026-05-10 11:39 ` Usama Arif
@ 2026-05-11 6:36 ` David Hildenbrand (Arm)
2026-05-11 13:10 ` Usama Arif
0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11 6:36 UTC (permalink / raw)
To: Usama Arif
Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
ziy, ljs, akpm
>>
>> I tend to like (2), and maybe (3) on top. Opinions?
>>
>
> Hello!
Hi!
>
> I think (3) definitely makes sense.
>
> I have not had a deep look at KSM up until just now, so might be dumb
> to say all of below.. :)
>
> What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
> THP whenever it actually wants to merge a single 4K chunk. That seems like a
> lot of work for a single 4K?
Yes, but that's what the users ask for: if there is a chance to deduplicate
memory, it shall be deduplicated asap.
>
> One thing that came to my mind is to have a separate tree for THPs and only
> merge the THPs that have the same content, but the possibility of encoutering
> 2M pages with same content is extremely low? so this is probably a bad idea.
Right, the probability is low, and it would change existing semantics, breaking
existing users.
In addition, we would have to add large folio support for KSM, which I rather
would avoid.
>
> An alternative is, does it even make sense to process and split THPs by KSM
> in the way it works now? IMO this is a lot of work for a single 4K merge.
> Shrinker is designed to release memory when its needed, i.e. reclaim, at
> which point IMO free memory is more important than performance. But KSM runs
> all the time.. so constantly splitting THPs everytime a single 4K can be
> merged just hurts performance all the time.
Right, but that's what you get with KSM: bad performance if there is a chance to
deduplicate :)
(and bad performance from scanning overhead)
> If someone cares about memory,
> they should be running the shrinker.
It's not just the zero page, but really any page content. The zero page is
currently only "special" after we added conditional support to deduplicate to
the shared zeropage in KSM. The shrinker doesn't help for any other page content
besides zero-filled.
Further, the shrinker is something system-wide, whereby KSM is usually only
enabled for selected VMAs (with some exceptions nowadays).
Also note that KSM deduplicates independent of the folio size: not just THPs,
but really any (large) folio. Yes, it splits large folios, but that's really
just to keep the T in THP.
> Is a better alternative that KSM skips
> THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
> only then KSM gets those 4K subpages?
>
> Above sounds like reworking KSM, but just wanted to put it out there.
Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>
> (2) + (3) sounds like a good solution, but I wonder if above alternative
> of KSM just skipping THPs might be better?
That would change the semantics where, for example, where we expect that memory
was deduplicated after a KSM run.
VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
except where we can deduplicate memory. Skipping THPs would essentially break
the main use case for KSM :)
Does that make sense?
--
Cheers,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
2026-05-11 6:36 ` David Hildenbrand (Arm)
@ 2026-05-11 13:10 ` Usama Arif
2026-05-11 13:42 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 12+ messages in thread
From: Usama Arif @ 2026-05-11 13:10 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
ziy, ljs, akpm
On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>
>>>
>>> I tend to like (2), and maybe (3) on top. Opinions?
>>>
>>
>> Hello!
>
>
> Hi!
>
>>
>> I think (3) definitely makes sense.
>>
>> I have not had a deep look at KSM up until just now, so might be dumb
>> to say all of below.. :)
>>
>> What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
>> THP whenever it actually wants to merge a single 4K chunk. That seems like a
>> lot of work for a single 4K?
>
> Yes, but that's what the users ask for: if there is a chance to deduplicate
> memory, it shall be deduplicated asap.
>
>>
>> One thing that came to my mind is to have a separate tree for THPs and only
>> merge the THPs that have the same content, but the possibility of encoutering
>> 2M pages with same content is extremely low? so this is probably a bad idea.
>
> Right, the probability is low, and it would change existing semantics, breaking
> existing users.
>
> In addition, we would have to add large folio support for KSM, which I rather
> would avoid.
>
>>
>> An alternative is, does it even make sense to process and split THPs by KSM
>> in the way it works now? IMO this is a lot of work for a single 4K merge.
>> Shrinker is designed to release memory when its needed, i.e. reclaim, at
>> which point IMO free memory is more important than performance. But KSM runs
>> all the time.. so constantly splitting THPs everytime a single 4K can be
>> merged just hurts performance all the time.
>
> Right, but that's what you get with KSM: bad performance if there is a chance to
> deduplicate :)
>
> (and bad performance from scanning overhead)
>
>> If someone cares about memory,
>> they should be running the shrinker.
>
> It's not just the zero page, but really any page content. The zero page is
> currently only "special" after we added conditional support to deduplicate to
> the shared zeropage in KSM. The shrinker doesn't help for any other page content
> besides zero-filled.
>
> Further, the shrinker is something system-wide, whereby KSM is usually only
> enabled for selected VMAs (with some exceptions nowadays).
>
> Also note that KSM deduplicates independent of the folio size: not just THPs,
> but really any (large) folio. Yes, it splits large folios, but that's really
> just to keep the T in THP.
>
>> Is a better alternative that KSM skips
>> THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
>> only then KSM gets those 4K subpages?
>>
>> Above sounds like reworking KSM, but just wanted to put it out there.
>
> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>
>>
>> (2) + (3) sounds like a good solution, but I wonder if above alternative
>> of KSM just skipping THPs might be better?
>
> That would change the semantics where, for example, where we expect that memory
> was deduplicated after a KSM run.
>
> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
> except where we can deduplicate memory. Skipping THPs would essentially break
> the main use case for KSM :)
>
> Does that make sense?
>
Yes, all of above makes sense. But I feel like this means someone should not
set THP policy to always and enable KSM together. In general I feel like KSM
is not something that should be run on big servers, as hopefully you are
not managing memory as 4K chunks for big machines and using a lot of THPs.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
2026-05-11 13:10 ` Usama Arif
@ 2026-05-11 13:42 ` David Hildenbrand (Arm)
2026-05-11 13:44 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11 13:42 UTC (permalink / raw)
To: Usama Arif
Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
ziy, ljs, akpm
On 5/11/26 15:10, Usama Arif wrote:
>
>
> On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>>
>>>
>>> Hello!
>>
>>
>> Hi!
>>
>>>
>>> I think (3) definitely makes sense.
>>>
>>> I have not had a deep look at KSM up until just now, so might be dumb
>>> to say all of below.. :)
>>>
>>> What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
>>> THP whenever it actually wants to merge a single 4K chunk. That seems like a
>>> lot of work for a single 4K?
>>
>> Yes, but that's what the users ask for: if there is a chance to deduplicate
>> memory, it shall be deduplicated asap.
>>
>>>
>>> One thing that came to my mind is to have a separate tree for THPs and only
>>> merge the THPs that have the same content, but the possibility of encoutering
>>> 2M pages with same content is extremely low? so this is probably a bad idea.
>>
>> Right, the probability is low, and it would change existing semantics, breaking
>> existing users.
>>
>> In addition, we would have to add large folio support for KSM, which I rather
>> would avoid.
>>
>>>
>>> An alternative is, does it even make sense to process and split THPs by KSM
>>> in the way it works now? IMO this is a lot of work for a single 4K merge.
>>> Shrinker is designed to release memory when its needed, i.e. reclaim, at
>>> which point IMO free memory is more important than performance. But KSM runs
>>> all the time.. so constantly splitting THPs everytime a single 4K can be
>>> merged just hurts performance all the time.
>>
>> Right, but that's what you get with KSM: bad performance if there is a chance to
>> deduplicate :)
>>
>> (and bad performance from scanning overhead)
>>
>>> If someone cares about memory,
>>> they should be running the shrinker.
>>
>> It's not just the zero page, but really any page content. The zero page is
>> currently only "special" after we added conditional support to deduplicate to
>> the shared zeropage in KSM. The shrinker doesn't help for any other page content
>> besides zero-filled.
>>
>> Further, the shrinker is something system-wide, whereby KSM is usually only
>> enabled for selected VMAs (with some exceptions nowadays).
>>
>> Also note that KSM deduplicates independent of the folio size: not just THPs,
>> but really any (large) folio. Yes, it splits large folios, but that's really
>> just to keep the T in THP.
>>
>>> Is a better alternative that KSM skips
>>> THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
>>> only then KSM gets those 4K subpages?
>>>
>>> Above sounds like reworking KSM, but just wanted to put it out there.
>>
>> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>>
>>>
>>> (2) + (3) sounds like a good solution, but I wonder if above alternative
>>> of KSM just skipping THPs might be better?
>>
>> That would change the semantics where, for example, where we expect that memory
>> was deduplicated after a KSM run.
>>
>> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
>> except where we can deduplicate memory. Skipping THPs would essentially break
>> the main use case for KSM :)
>>
>> Does that make sense?
>>
>
> Yes, all of above makes sense. But I feel like this means someone should not
> set THP policy to always and enable KSM together.
IIRC, QEMU will, as default, set MADV_HUGEPAGE and MADV_MERGEABLE :)
(KSM itself later has to be enabled manually on a system level)
> In general I feel like KSM
> is not something that should be run on big servers, as hopefully you are
> not managing memory as 4K chunks for big machines and using a lot of THPs.
Right. But the 4k chunks are movable and compaction can move them around to
create THPs elsewhere.
--
Cheers,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
2026-05-11 13:42 ` David Hildenbrand (Arm)
@ 2026-05-11 13:44 ` David Hildenbrand (Arm)
2026-05-11 14:15 ` Usama Arif
0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11 13:44 UTC (permalink / raw)
To: Usama Arif
Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
ziy, ljs, akpm
On 5/11/26 15:42, David Hildenbrand (Arm) wrote:
> On 5/11/26 15:10, Usama Arif wrote:
>>
>>
>> On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>>>
>>>
>>>
>>> Hi!
>>>
>>>
>>> Yes, but that's what the users ask for: if there is a chance to deduplicate
>>> memory, it shall be deduplicated asap.
>>>
>>>
>>> Right, the probability is low, and it would change existing semantics, breaking
>>> existing users.
>>>
>>> In addition, we would have to add large folio support for KSM, which I rather
>>> would avoid.
>>>
>>>
>>> Right, but that's what you get with KSM: bad performance if there is a chance to
>>> deduplicate :)
>>>
>>> (and bad performance from scanning overhead)
>>>
>>>
>>> It's not just the zero page, but really any page content. The zero page is
>>> currently only "special" after we added conditional support to deduplicate to
>>> the shared zeropage in KSM. The shrinker doesn't help for any other page content
>>> besides zero-filled.
>>>
>>> Further, the shrinker is something system-wide, whereby KSM is usually only
>>> enabled for selected VMAs (with some exceptions nowadays).
>>>
>>> Also note that KSM deduplicates independent of the folio size: not just THPs,
>>> but really any (large) folio. Yes, it splits large folios, but that's really
>>> just to keep the T in THP.
>>>
>>>
>>> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>>>
>>>
>>> That would change the semantics where, for example, where we expect that memory
>>> was deduplicated after a KSM run.
>>>
>>> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
>>> except where we can deduplicate memory. Skipping THPs would essentially break
>>> the main use case for KSM :)
>>>
>>> Does that make sense?
>>>
>>
>> Yes, all of above makes sense. But I feel like this means someone should not
>> set THP policy to always and enable KSM together.
>
> IIRC, QEMU will, as default, set MADV_HUGEPAGE and MADV_MERGEABLE :)
>
> (KSM itself later has to be enabled manually on a system level)
Digging around, RHEL documents that one might want to consider disabling THPs
for performance:
"As KSM can reduce the occurence of transparent huge pages, you may want to
disable it before enabling THP." [1]
But that doesn't mean that some people are using that combination.
In the end "some THPs" is better than "no THPs" :)
[1]
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-memory-huge_pages
--
Cheers,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
2026-05-11 13:44 ` David Hildenbrand (Arm)
@ 2026-05-11 14:15 ` Usama Arif
0 siblings, 0 replies; 12+ messages in thread
From: Usama Arif @ 2026-05-11 14:15 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Nico Pache, linux-mm, linux-kernel, yuzhao, usamaarif642,
lance.yang, baohua, dev.jain, ryan.roberts, liam, baolin.wang,
ziy, ljs, akpm
On 11/05/2026 14:44, David Hildenbrand (Arm) wrote:
> On 5/11/26 15:42, David Hildenbrand (Arm) wrote:
>> On 5/11/26 15:10, Usama Arif wrote:
>>>
>>>
>>> On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>>>>
>>>>
>>>>
>>>> Hi!
>>>>
>>>>
>>>> Yes, but that's what the users ask for: if there is a chance to deduplicate
>>>> memory, it shall be deduplicated asap.
>>>>
>>>>
>>>> Right, the probability is low, and it would change existing semantics, breaking
>>>> existing users.
>>>>
>>>> In addition, we would have to add large folio support for KSM, which I rather
>>>> would avoid.
>>>>
>>>>
>>>> Right, but that's what you get with KSM: bad performance if there is a chance to
>>>> deduplicate :)
>>>>
>>>> (and bad performance from scanning overhead)
>>>>
>>>>
>>>> It's not just the zero page, but really any page content. The zero page is
>>>> currently only "special" after we added conditional support to deduplicate to
>>>> the shared zeropage in KSM. The shrinker doesn't help for any other page content
>>>> besides zero-filled.
>>>>
>>>> Further, the shrinker is something system-wide, whereby KSM is usually only
>>>> enabled for selected VMAs (with some exceptions nowadays).
>>>>
>>>> Also note that KSM deduplicates independent of the folio size: not just THPs,
>>>> but really any (large) folio. Yes, it splits large folios, but that's really
>>>> just to keep the T in THP.
>>>>
>>>>
>>>> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>>>>
>>>>
>>>> That would change the semantics where, for example, where we expect that memory
>>>> was deduplicated after a KSM run.
>>>>
>>>> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
>>>> except where we can deduplicate memory. Skipping THPs would essentially break
>>>> the main use case for KSM :)
>>>>
>>>> Does that make sense?
>>>>
>>>
>>> Yes, all of above makes sense. But I feel like this means someone should not
>>> set THP policy to always and enable KSM together.
>>
>> IIRC, QEMU will, as default, set MADV_HUGEPAGE and MADV_MERGEABLE :)
That is interesting... :)
>>
>> (KSM itself later has to be enabled manually on a system level)
>
> Digging around, RHEL documents that one might want to consider disabling THPs
> for performance:
>
> "As KSM can reduce the occurence of transparent huge pages, you may want to
> disable it before enabling THP." [1]
>
> But that doesn't mean that some people are using that combination.
>
> In the end "some THPs" is better than "no THPs" :)
>
> [1]
> https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-memory-huge_pages
>
Thanks for sharing this and the link, its very useful.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] mm: restrict zero-page remapping to underused THP splits
2026-05-08 21:32 ` David Hildenbrand (Arm)
2026-05-09 8:25 ` Lance Yang
2026-05-10 11:39 ` Usama Arif
@ 2026-05-11 18:40 ` Nico Pache
2 siblings, 0 replies; 12+ messages in thread
From: Nico Pache @ 2026-05-11 18:40 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, linux-kernel, yuzhao, usamaarif642, lance.yang, baohua,
dev.jain, ryan.roberts, liam, baolin.wang, ziy, ljs, akpm
On Fri, May 8, 2026 at 3:32 PM David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> On 5/8/26 19:05, Nico Pache wrote:
> > Since commit b1f202060afe ("mm: remap unused subpages to shared zeropage
> > when splitting isolated thp"), splitting an anonymous THP remaps all
> > zero-filled subpages to the shared zeropage via TTU_USE_SHARED_ZEROPAGE.
> > This flag is set unconditionally for every anonymous folio split,
> > including splits triggered by KSM.
>
> And even when the underused scanner is effectively disabled on a system. Hm.
>
> I don't quite like that we scan for zeropages when nobody even requested us to
> split because of zeropages.
>
> I can see why we would want to scan for zeropages in a setup where the underused
> scanner is active, even when the split was triggered by someone/something else
> (below).
>
> [...]
>
> > /**
> > @@ -4340,7 +4341,13 @@ int folio_split(struct folio *folio, unsigned int new_order,
> > struct page *split_at, struct list_head *list)
> > {
> > return __folio_split(folio, new_order, split_at, &folio->page, list,
> > - SPLIT_TYPE_NON_UNIFORM);
> > + SPLIT_TYPE_NON_UNIFORM, false);
> > +}
> > +
> > +int folio_split_underused(struct folio *folio)
> > +{
> > + return __folio_split(folio, 0, &folio->page, &folio->page,
> > + NULL, SPLIT_TYPE_NON_UNIFORM, true);
> > }
> >
> > /**
> > @@ -4559,7 +4566,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> > }
> > if (!folio_trylock(folio))
> > goto requeue;
> > - if (!split_folio(folio)) {
> > + if (!folio_split_underused(folio)) {
> > did_split = true;
> > if (underused)
> > count_vm_event(THP_UNDERUSED_SPLIT_PAGE);
>
> In general, this looks clean.
>
> But imagine the following: someone splits the THP for another reason: for
> example, because migration is unable to allocate a 2M THP, or because we have to
> split on swapout etc.
>
> Not freeing the zero-filled pages means that these pages cannot be reclaimed
> anymore easily. We split a possibly underused THP but didn't free the memory.
>
> The only way to free the memory would be to wait for another collapse, and then
> have the new THP be detected as underused.
>
> Hm.
And what was the expected behavior before this commit? Did we just
deal with the wasted memory?
>
> (1) As you say, the alternative is to let KSM say that it wants to handle the
> zero-filled pages itself. I'm not a the biggest fan of that approach. We still
> have two mechanisms interacting to some degree.
>
> (2) Another approach is to just let KSM handle this in VMAs that are marked as
> mergable while KSM is active. That is, we check for VM_MERGABLE and ksm_run ==
> KSM_RUN_MERGE in try_to_map_unused_to_zeropage() to just let KSM do its thing.
>
> That really just stops both mechanisms from interacting.
>
> (3) Yet another approach I could think of (in general) is to disable the
> underused handling in a system where the underused splitting is entirely disabled.
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index e9d499da0ac7..5eca99271957 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -82,6 +82,14 @@ unsigned long huge_anon_orders_madvise __read_mostly;
> unsigned long huge_anon_orders_inherit __read_mostly;
> static bool anon_orders_configured __initdata;
>
> +static bool thp_underused_split_active(void)
> +{
> + if (!split_underused_thp)
> + return false;
> +
> + return khugepaged_max_ptes_none != HPAGE_PMD_NR - 1;
> +}
> +
> static inline bool file_thp_enabled(struct vm_area_struct *vma)
> {
> struct inode *inode;
> @@ -4188,7 +4196,8 @@ static int __folio_split(struct folio *folio, unsigned int
> new_order,
> if (nr_shmem_dropped)
> shmem_uncharge(mapping->host, nr_shmem_dropped);
>
> - if (!ret && is_anon && !folio_is_device_private(folio))
> + if (!ret && is_anon && !folio_is_device_private(folio) &&
> + thp_underused_split_active())
> ttu_flags = TTU_USE_SHARED_ZEROPAGE;
>
> remap_page(folio, 1 << old_order, ttu_flags);
> @@ -4497,7 +4506,7 @@ static bool thp_underused(struct folio *folio)
> int num_zero_pages = 0, num_filled_pages = 0;
> int i;
>
> - if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
> + if (!thp_underused_split_active())
> return false;
>
> if (folio_contain_hwpoisoned_page(folio))
>
>
>
> I tend to like (2), and maybe (3) on top. Opinions?
I don't fully understand (2) but I definitely agree with (3).
Isn't (2) similar to my split_huge_page_no_zeropage() solution in that
it only disables the behavior for KSM? but instead handled much
further down in the call path? The "fix" commit sold this as a
solution ONLY for the underutilized shrinker, but it is not that.
Either way I can test that solution. It seems clean when I gave it a
second pass; it just seems to be hiding behavior deeper in the code,
but at the proper location (the function that checks whether this
action should occur).
Cheers,
-- Nico
>
>
> --
> Cheers,
>
> David
>
^ permalink raw reply [flat|nested] 12+ messages in thread