From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Usama Arif <usama.arif@linux.dev>
Cc: Nico Pache <npache@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
yuzhao@google.com, usamaarif642@gmail.com, lance.yang@linux.dev,
baohua@kernel.org, dev.jain@arm.com, ryan.roberts@arm.com,
liam@infradead.org, baolin.wang@linux.alibaba.com,
ziy@nvidia.com, ljs@kernel.org, akpm@linux-foundation.org
Subject: Re: [RFC] mm: restrict zero-page remapping to underused THP splits
Date: Mon, 11 May 2026 15:42:44 +0200 [thread overview]
Message-ID: <574fc329-bf2d-4686-9f15-b1709432326e@kernel.org> (raw)
In-Reply-To: <608bef55-44d1-47f1-a201-4a6bd7be137d@linux.dev>
On 5/11/26 15:10, Usama Arif wrote:
>
>
> On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>>
>>>
>>> Hello!
>>
>>
>> Hi!
>>
>>>
>>> I think (3) definitely makes sense.
>>>
>>> I have not had a deep look at KSM up until just now, so might be dumb
>>> to say all of below.. :)
>>>
>>> What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
>>> THP whenever it actually wants to merge a single 4K chunk. That seems like a
>>> lot of work for a single 4K?
>>
>> Yes, but that's what the users ask for: if there is a chance to deduplicate
>> memory, it shall be deduplicated asap.
>>
>>>
>>> One thing that came to my mind is to have a separate tree for THPs and only
>>> merge the THPs that have the same content, but the possibility of encoutering
>>> 2M pages with same content is extremely low? so this is probably a bad idea.
>>
>> Right, the probability is low, and it would change existing semantics, breaking
>> existing users.
>>
>> In addition, we would have to add large folio support for KSM, which I rather
>> would avoid.
>>
>>>
>>> An alternative is, does it even make sense to process and split THPs by KSM
>>> in the way it works now? IMO this is a lot of work for a single 4K merge.
>>> Shrinker is designed to release memory when its needed, i.e. reclaim, at
>>> which point IMO free memory is more important than performance. But KSM runs
>>> all the time.. so constantly splitting THPs everytime a single 4K can be
>>> merged just hurts performance all the time.
>>
>> Right, but that's what you get with KSM: bad performance if there is a chance to
>> deduplicate :)
>>
>> (and bad performance from scanning overhead)
>>
>>> If someone cares about memory,
>>> they should be running the shrinker.
>>
>> It's not just the zero page, but really any page content. The zero page is
>> currently only "special" after we added conditional support to deduplicate to
>> the shared zeropage in KSM. The shrinker doesn't help for any other page content
>> besides zero-filled.
>>
>> Further, the shrinker is something system-wide, whereby KSM is usually only
>> enabled for selected VMAs (with some exceptions nowadays).
>>
>> Also note that KSM deduplicates independent of the folio size: not just THPs,
>> but really any (large) folio. Yes, it splits large folios, but that's really
>> just to keep the T in THP.
>>
>>> Is a better alternative that KSM skips
>>> THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
>>> only then KSM gets those 4K subpages?
>>>
>>> Above sounds like reworking KSM, but just wanted to put it out there.
>>
>> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>>
>>>
>>> (2) + (3) sounds like a good solution, but I wonder if above alternative
>>> of KSM just skipping THPs might be better?
>>
>> That would change the semantics where, for example, where we expect that memory
>> was deduplicated after a KSM run.
>>
>> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
>> except where we can deduplicate memory. Skipping THPs would essentially break
>> the main use case for KSM :)
>>
>> Does that make sense?
>>
>
> Yes, all of above makes sense. But I feel like this means someone should not
> set THP policy to always and enable KSM together.
IIRC, QEMU will, as default, set MADV_HUGEPAGE and MADV_MERGEABLE :)
(KSM itself later has to be enabled manually on a system level)
> In general I feel like KSM
> is not something that should be run on big servers, as hopefully you are
> not managing memory as 4K chunks for big machines and using a lot of THPs.
Right. But the 4k chunks are movable and compaction can move them around to
create THPs elsewhere.
--
Cheers,
David
next prev parent reply other threads:[~2026-05-11 13:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 17:05 [RFC] mm: restrict zero-page remapping to underused THP splits Nico Pache
2026-05-08 21:32 ` David Hildenbrand (Arm)
2026-05-09 8:25 ` Lance Yang
2026-05-10 11:39 ` Usama Arif
2026-05-11 6:36 ` David Hildenbrand (Arm)
2026-05-11 13:10 ` Usama Arif
2026-05-11 13:42 ` David Hildenbrand (Arm) [this message]
2026-05-11 13:44 ` David Hildenbrand (Arm)
2026-05-11 14:15 ` Usama Arif
2026-05-11 18:40 ` Nico Pache
2026-05-09 3:21 ` Lance Yang
2026-05-11 18:42 ` Nico Pache
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=574fc329-bf2d-4686-9f15-b1709432326e@kernel.org \
--to=david@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=usama.arif@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox