From: "Zach O'Keefe" <zokeefe@google.com>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
David Hildenbrand <david@redhat.com>,
"Christoph Lameter (Ampere)" <cl@gentwo.org>,
Nico Pache <npache@redhat.com>,
linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-doc@vger.kernel.org, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net,
rostedt@goodmis.org, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com, akpm@linux-foundation.org,
baohua@kernel.org, willy@infradead.org, peterx@redhat.com,
wangkefeng.wang@huawei.com, usamaarif642@gmail.com,
sunnanyong@huawei.com, vishal.moola@gmail.com,
thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com,
kas@kernel.org, aarcange@redhat.com, raquini@redhat.com,
anshuman.khandual@arm.com, catalin.marinas@arm.com,
tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com,
jack@suse.cz, jglisse@google.com, surenb@google.com,
hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com,
rdunlap@infradead.org, hughd@google.com,
richard.weiyang@gmail.com, lance.yang@linux.dev, vbabka@suse.cz,
rppt@kernel.org, jannh@google.com,
Bagas Sanjaya <bagasdotme@gmail.com>
Subject: Re: [PATCH v12 mm-new 15/15] Documentation: mm: update the admin guide for mTHP collapse
Date: Fri, 24 Oct 2025 06:54:44 -0700 [thread overview]
Message-ID: <CAAa6QmQBmZ-82PwzLao=gO-+1u=GFyPogmVOjAFQ-esVdo9tYQ@mail.gmail.com> (raw)
In-Reply-To: <c62m3tyr6co7jqdrwhtp7exnewhogxtife7g6yh4gve7gqecz6@b5xpocyvifxp>
On Thu, Oct 23, 2025 at 1:44 AM Pedro Falcato <pfalcato@suse.de> wrote:
>
> On Thu, Oct 23, 2025 at 09:00:10AM +0100, Lorenzo Stoakes wrote:
> > On Wed, Oct 22, 2025 at 10:22:08PM +0200, David Hildenbrand wrote:
> > > On 22.10.25 21:52, Christoph Lameter (Ampere) wrote:
> > > > On Wed, 22 Oct 2025, Nico Pache wrote:
> > > >
> > > > > Currently, madvise_collapse only supports collapsing to PMD-sized THPs +
> > > > > and does not attempt mTHP collapses. +
> > > >
> > > > madvise collapse is frequently used as far as I can tell from the THP
> > > > loads being tested. Could we support madvise collapse for mTHP?
> > >
> > > The big question is still how user space can communicate the desired order,
> > > and how we can not break existing users.
> >
>
> Do we want to let userspace communicate order? It seems like an extremely
> specific thing to do. A more simple&sane semantic could be something like:
> "MADV_COLLAPSE collapses a given [addr, addr+len] range into the highest
> order THP it can/thinks it should.". The implementation details of PMD or
> contpte or <...> are lost by the time we get to userspace.
>
> The man page itself is pretty vaguely written to allow us to do whatever
> we want. It sounds to me that allowing userspace to create arbitrary order
> mTHPs would be another pandora's box we shouldn't get into.
>
> > Yes, and let's go one step at a time, this series still needs careful scrutiny
> > and we need to ensure the _fundamentals_ are in place for khugepaged before we
> > get into MADV_COLLAPSE :)
> >
> > >
> > > So I guess there will definitely be some support to trigger collapse to mTHP
> > > in the future, the big question is through which interface. So it will
> > > happen after this series.
> >
> > Yes.
> >
> > >
> > > Maybe through process_madvise() where we have an additional parameter, I
> > > think that was what people discussed in the past.
> >
> > I wouldn't absolutely love us doing that, given it is a general parameter so
> > would seem applicable to any madvise() option and could lead to confusion, also
> > process_madvise() was originally for cross-process madvise vector operations.
>
> For what it's worth, it would probably not be too hard to devise a generic
> separation there between "generic flags" and "behavior-specific flags".
> And then stuff the desired THP order into MADV_COLLAPSE-specific flags.
Yeah, this is how I envisioned the flags to be leveraged; reserve some
number of bits for generic, and overload the others for
advice-specific. I suspect once the seal is broken on this, more
advice-specific flags will promptly follow.
> >
> > I expanded this to make it applicable to the current process (and introduced
> > PIDFD_SELF to make that more sane), and SJ has optimised it across vector
> > operations (thanks SJ! :), but in general - it seems very weird to have
> > madvise() provide an operation that process_madvise() providse another version
> > of that has an extra parameter.
> >
> > As usual we've painted ourselves into a corner with an API... :)
>
> But yes, I agree it would feel weird.
>
> >
> > Perhaps we'll to accept the process_madvise() compromise and add
> > MADV_COLLAPSE_MHTP that only works with it or something.
> >
> > Of course adding a new syscall isn't impossible... madvise2() not very appealing
> > however...
>
> It is my impression that process_madvise() is already madvise2(), but
> poorly named.
+1
> >
> > TL;DR I guess we'll deal with that when we come to it :)
>
> Amen :)
>
> --
> Pedro
next prev parent reply other threads:[~2025-10-24 13:55 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-22 18:37 [PATCH v12 mm-new 00/15] khugepaged: mTHP support Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 01/15] khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2025-11-08 1:42 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 02/15] introduce collapse_single_pmd to unify khugepaged and madvise_collapse Nico Pache
2025-10-27 9:00 ` Lance Yang
2025-10-27 15:44 ` Lorenzo Stoakes
2025-11-08 1:44 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 03/15] khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2025-10-27 9:02 ` Lance Yang
2025-11-08 1:54 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 04/15] khugepaged: generalize alloc_charge_folio() Nico Pache
2025-10-27 9:05 ` Lance Yang
2025-11-08 2:34 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 05/15] khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2025-10-27 9:17 ` Lance Yang
2025-10-27 16:00 ` Lorenzo Stoakes
2025-11-10 13:20 ` Nico Pache
2025-11-08 3:01 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 06/15] khugepaged: introduce collapse_max_ptes_none helper function Nico Pache
2025-10-27 17:53 ` Lorenzo Stoakes
2025-10-28 10:09 ` Baolin Wang
2025-10-28 13:57 ` Nico Pache
2025-10-28 17:07 ` Lorenzo Stoakes
2025-10-28 17:56 ` David Hildenbrand
2025-10-28 18:09 ` Lorenzo Stoakes
2025-10-28 18:17 ` David Hildenbrand
2025-10-28 18:41 ` Lorenzo Stoakes
2025-10-29 15:04 ` David Hildenbrand
2025-10-29 18:41 ` Lorenzo Stoakes
2025-10-29 21:10 ` Nico Pache
2025-10-30 18:03 ` Lorenzo Stoakes
2025-10-29 20:45 ` Nico Pache
2025-10-28 13:36 ` Nico Pache
2025-10-28 14:15 ` David Hildenbrand
2025-10-28 17:29 ` Lorenzo Stoakes
2025-10-28 17:36 ` Lorenzo Stoakes
2025-10-28 18:08 ` David Hildenbrand
2025-10-28 18:59 ` Lorenzo Stoakes
2025-10-28 19:08 ` Lorenzo Stoakes
2025-10-29 2:09 ` Baolin Wang
2025-10-29 2:49 ` Nico Pache
2025-10-29 18:55 ` Lorenzo Stoakes
2025-10-29 21:14 ` Nico Pache
2025-10-30 1:15 ` Baolin Wang
2025-10-29 2:47 ` Nico Pache
2025-10-29 18:58 ` Lorenzo Stoakes
2025-10-29 21:23 ` Nico Pache
2025-10-30 10:15 ` Lorenzo Stoakes
2025-10-31 11:12 ` David Hildenbrand
2025-10-28 16:57 ` Lorenzo Stoakes
2025-10-28 17:49 ` David Hildenbrand
2025-10-28 17:59 ` Lorenzo Stoakes
2025-10-22 18:37 ` [PATCH v12 mm-new 07/15] khugepaged: generalize collapse_huge_page for mTHP collapse Nico Pache
2025-10-27 3:25 ` Baolin Wang
2025-11-06 18:14 ` Lorenzo Stoakes
2025-11-07 3:09 ` Dev Jain
2025-11-07 9:18 ` Lorenzo Stoakes
2025-11-07 19:33 ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 08/15] khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 09/15] khugepaged: add per-order mTHP collapse failure statistics Nico Pache
2025-11-06 18:45 ` Lorenzo Stoakes
2025-11-07 17:14 ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 10/15] khugepaged: improve tracepoints for mTHP orders Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 11/15] khugepaged: introduce collapse_allowable_orders helper function Nico Pache
2025-11-06 18:49 ` Lorenzo Stoakes
2025-11-07 18:01 ` Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 12/15] khugepaged: Introduce mTHP collapse support Nico Pache
2025-10-27 6:28 ` Baolin Wang
2025-11-09 2:08 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 13/15] khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2025-11-09 2:40 ` Wei Yang
2025-10-22 18:37 ` [PATCH v12 mm-new 14/15] khugepaged: run khugepaged for all orders Nico Pache
2025-10-22 18:37 ` [PATCH v12 mm-new 15/15] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2025-10-22 19:52 ` Christoph Lameter (Ampere)
2025-10-22 20:22 ` David Hildenbrand
2025-10-23 8:00 ` Lorenzo Stoakes
2025-10-23 8:44 ` Pedro Falcato
2025-10-24 13:54 ` Zach O'Keefe [this message]
2025-10-23 23:41 ` Christoph Lameter (Ampere)
2025-10-22 20:13 ` [PATCH v12 mm-new 00/15] khugepaged: mTHP support Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAa6QmQBmZ-82PwzLao=gO-+1u=GFyPogmVOjAFQ-esVdo9tYQ@mail.gmail.com' \
--to=zokeefe@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=bagasdotme@gmail.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jglisse@google.com \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=raquini@redhat.com \
--cc=rdunlap@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tiwai@suse.de \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).