DAMON development mailing list
 help / color / mirror / Atom feed
From: SJ Park <sj@kernel.org>
To: Lian Wang <lianux.mm@gmail.com>
Cc: SJ Park <sj@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
	damon@lists.linux.dev, linux-mm@kvack.org,
	daichaobing@sangfor.com.cn, kunwu.chan@gmail.com,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R. Howlett" <liam@infradead.org>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	linux-kernel@vger.kernel.org
Subject: Re: [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers
Date: Thu,  2 Jul 2026 12:43:53 -0700	[thread overview]
Message-ID: <20260702194353.91778-1-sj@kernel.org> (raw)
In-Reply-To: <akZDj6s8U5_Mnetv@lucifer>

On Thu, 2 Jul 2026 12:07:01 +0100 Lorenzo Stoakes <ljs@kernel.org> wrote:

> (+cc missing people again)

Thank you for adding the recipients, and review, Lorenzo!

> 
> Sorry but we're not going to accept anything that exports THP logic like this at
> all.
> 
> And a damon wrapper in core mm code is just a non-starter, so you really need to
> rethink your approach.
> 
> I think SJ already commented on this in your v1 from what I can see? I'd listen
> to his advice on this :)

Lorenzo is right.  Not disrupting the world outside of mm/damon/ is the first
principle of DAMON development.  Sometimes we may have to make some changes
outside of mm/damon/, but we MUST make it not disruptive, small, and perfectly
aligned with the developers of the area with respects.

> 
> On Thu, Jul 02, 2026 at 05:46:30PM +0800, Lian Wang wrote:
> > Export a thin wrapper around collapse_huge_page() that allows external
> > subsystems such as DAMON to trigger THP collapse on a target address
> > range.
> >
> > Currently restricted to PMD order (HPAGE_PMD_ORDER), since
> > collapse_huge_page() does not yet support arbitrary mTHP orders.
> > The restriction can be relaxed when khugepaged gains mTHP support.
> >
> > The caller must hold a reference to @mm.  Do not hold mmap lock:
> > collapse_huge_page() acquires mmap_read_lock for validation, releases
> > it, then acquires mmap_write_lock for the actual collapse.  Holding
> > an outer mmap_read_lock would cause a self-deadlock when the same
> > thread attempts the inner mmap_write_lock.
> >
> > Co-developed-by: Kunwu Chan <kunwu.chan@gmail.com>
> > Signed-off-by: Kunwu Chan <kunwu.chan@gmail.com>
> > Signed-off-by: Lian Wang <lianux.mm@gmail.com>
> > Signed-off-by: Lian Wang <lianux.wang@processmission.com>
> 
> This patch is exporting internal mm logic without proper safeguards so it's just
> not something we're going to accept, sorry.
> 
> (Also not sure it's correct to have multiple S-o-b for the same person (unless
> re-tagging a backport or something)? I'm not sure though)

I agree to Lorenzo.

As I replied [1] to patch 1, if you just want to give a credit to your
employer, you could add your employer name on the single tag, like "Lian Wang
(processssmission) <lianux.mm@gmail.com>".

If it is for ensure the replies goes to not only your private inbox but also
corporate email inbox, you can simply Cc: your corporate email address.

> 
> > ---
> >  include/linux/khugepaged.h |  9 ++++++++
> >  mm/khugepaged.c            | 46 ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 55 insertions(+)
> >
> > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > index d7a9053ff4fe..f7d49cba712f 100644
> > --- a/include/linux/khugepaged.h
> > +++ b/include/linux/khugepaged.h
> > @@ -20,6 +20,9 @@ extern bool current_is_khugepaged(void);
> >  void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
> >  		bool install_pmd);
> >
> > +int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
> > +			       unsigned int target_order);
> 
> No thanks. We're not putting damon wrappers into core code. This is breaking the
> abstraction and letting arbitrary users invoke internal mm logic.

I agree.  First of all, having something with 'damon_' prefix in khugepaged.h
makes no sense.  If we have to expose something, maybe mm/internal.h could be
considered in my opinion.  But for only DAMON usage, include/linux/khugepaged.h
is definitely wrong.

> 
> Plus you're literally exporting this so it can be abused by drivers.

I was raising my eyebrows here.  I show this patch adds EXPORT_SYMBOL_GPL() at
the end.  Lorenzo is completely right.  I believe Lian added that only by a
mistake.

> 
> > +
> >  static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> >  {
> >  	if (mm_flags_test(MMF_VM_HUGEPAGE, oldmm))
> > @@ -47,6 +50,12 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
> >  {
> >  }
> >
> > +static inline int damon_collapse_folio_range(struct mm_struct *mm,
> > +		unsigned long start_addr, unsigned int target_order)
> > +{
> > +	return -EINVAL;
> > +}
> > +
> >  static inline void khugepaged_min_free_kbytes_update(void)
> >  {
> >  }
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 617bca76db49..7fe9ce1e0533 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -3272,3 +3272,49 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >  	return thps == ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0
> >  			: madvise_collapse_errno(last_fail);
> >  }
> > +
> > +/**
> > + * damon_collapse_folio_range() - Collapse base pages in range into a THP
> > + * @mm:         mm_struct of the target process
> > + * @start_addr: start address (must be order-aligned)
> > + * @target_order: page order of the collapse result (currently only
> > + *                HPAGE_PMD_ORDER is supported)
> > + *
> > + * Thin wrapper around collapse_huge_page() for external callers such as
> > + * DAMON.  The caller must hold a reference to @mm.  Do not hold mmap
> 
> This is really fragile and bug bait.
> 
> > + * lock: collapse_huge_page() acquires mmap_read_lock for validation,
> 
> This is just gross, you're now collapsing based on an outdated concept of
> what the current VMA state is...
> 
> You're also losing literally everything that madvise_collapse() does.
> 
> AND you're overriding THP limitations like max_ptes_none, which is horrible...
> 
> > + * releases it, then acquires mmap_write_lock for the collapse.  Holding
> > + * an outer mmap_read_lock would self-deadlock.
> 
> This is a sign the interface is wrong!
> 
> > + *
> > + * Return: 0 on success, -EINVAL on bad arguments, negative error from
> > + *         madvise_collapse_errno() otherwise.
> > + */
> > +int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
> > +			       unsigned int target_order)
> > +{
> > +	struct collapse_control *cc;
> > +	enum scan_result result;
> > +
> > +	if (target_order != HPAGE_PMD_ORDER) {
> > +		pr_warn_once("%s: only PMD order (%u) is supported, got %u\n",
> > +			     __func__, HPAGE_PMD_ORDER, target_order);
> > +		return -EINVAL;
> > +	}
> > +	if (start_addr & ((PAGE_SIZE << target_order) - 1))
> > +		return -EINVAL;
> > +
> > +	cc = kmalloc_obj(*cc);
> > +	if (!cc)
> > +		return -ENOMEM;
> > +	cc->is_khugepaged = false;
> > +	cc->progress = 0;
> > +
> > +	lru_add_drain_all();
> 
> This is quite literally a copy/paste from madvise_collapse(). No no no :) code
> duplication like this is also unacceptable.

Thank you for kind review, Lorenzo.

I believe Lorenzo's concerns are valid and should all be addressed before
dropping RFC tag of this patch.  Also, if you "have to" post the next revision
before addressing those to get reviews of changes on other parts of this
series, let's make sure the fact is very clearly described.  In the way, we
can avoid wasting time of reviewers.  For example, you could add a note like
below at the beginning of the next revision of this patch.

"""
NOTE to THP reviewers: This patch is not yet addressing problems that are found
on the previous revision.  This version is posted for review of only other
parts in the series.  Please ignore this patch for now.
"""

> 
> > +
> > +	result = collapse_huge_page(mm, start_addr, 1, 0, cc, target_order);
> > +	kfree(cc);
> > +	if (result == SCAN_SUCCEED || result == SCAN_PMD_MAPPED)
> > +		return 0;
> > +	return madvise_collapse_errno(result);
> > +}
> > +EXPORT_SYMBOL_GPL(damon_collapse_folio_range);
> 
> We _definitely_ cannot have internal mm logic _exported_.

I completely agree.  Please drop this from the next version.

> 
> Yeah sorry you need to rethink this.

I wonder if we could extend madvise_collapse() for mTHP and use it.

I initially thought this is a silly idea, but
'Documentation/admin-guide/mm/transhuge.rst' is saying "currently",
madvise_collapse only supports collapsing to PMD-sized THPs and does not attemp
mTHP collapse.  It reads to me like supporting mTHP from MADV_COLLAPSE is a
possible TODO.  And it makes sense to me.  Users who do MADV_COLLAPSE may want
it to at least have a best-effort mTHP.  The interface of madvise_collapse() is
already receiving arbitrary address range, so fit with DAMON's purpose.

If supporting mTHP with MADV_THP is not a real TODO but just a silly idea that
better not to be implemented, I wonder if we can split out the core of
madvise_collapse(), extend it for mTHP, and expose it on mm/internal.h.

I concern if the above idea makes call depth unnecessarily deep, though.  A
shallower approach would be renaming madvise_collapse() to somewhat general (I
fail at making an example, but whatever without maddvise_ prefix) and extending
it for mTHP support.

Off the topic of this patch, but I find 'madvise_collapse()' is declared in
include/linux/huge_mm.h, while it is being used only inside mm/.  Would it make
sense to move the declaration to mm/internal.h?

[1] https://lore.kernel.org/20260702185137.91227-1-sj@kernel.org


Thanks,
SJ

[...]

  reply	other threads:[~2026-07-02 19:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-02  9:46 [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions Lian Wang
2026-07-02  9:46 ` [RESEND RFC PATCH v2 1/5] mm/damon: add target_order field for DAMOS_COLLAPSE Lian Wang
2026-07-02 10:01   ` sashiko-bot
2026-07-02 18:51   ` SJ Park
2026-07-02  9:46 ` [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers Lian Wang
2026-07-02 10:13   ` sashiko-bot
2026-07-02 11:07   ` Lorenzo Stoakes
2026-07-02 19:43     ` SJ Park [this message]
2026-07-02 20:32       ` SJ Park
2026-07-02  9:46 ` [RESEND RFC PATCH v2 3/5] mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler Lian Wang
2026-07-02 10:26   ` sashiko-bot
2026-07-02 19:56   ` SJ Park
2026-07-02  9:46 ` [RESEND RFC PATCH v2 4/5] mm/damon: introduce DAMOS_SPLIT action Lian Wang
2026-07-02 10:41   ` sashiko-bot
2026-07-02 20:00   ` SJ Park
2026-07-02  9:46 ` [RESEND RFC PATCH v2 5/5] mm/damon/vaddr: implement DAMOS_SPLIT handler Lian Wang
2026-07-02 10:49   ` sashiko-bot
2026-07-02 20:18   ` SJ Park
2026-07-02 10:23 ` [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions Lorenzo Stoakes
2026-07-02 16:52   ` SJ Park
2026-07-02 18:35 ` SJ Park
  -- strict thread matches above, loose matches on Subject: below --
2026-07-02  9:52 Lian Wang
2026-07-02  9:52 ` [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers Lian Wang
2026-07-02 10:08   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260702194353.91778-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=daichaobing@sangfor.com.cn \
    --cc=damon@lists.linux.dev \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=kunwu.chan@gmail.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=lianux.mm@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox