* Re: [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
[not found] <20260702095227.75866-1-lianux.mm@gmail.com>
@ 2026-07-02 16:39 ` SJ Park
0 siblings, 0 replies; 11+ messages in thread
From: SJ Park @ 2026-07-02 16:39 UTC (permalink / raw)
To: Lian Wang
Cc: SJ Park, damon, linux-mm, linux-kernel, gutierrez.asier,
daichaobing, lianux.wang, kunwu.chan
On Thu, 2 Jul 2026 17:52:22 +0800 Lian Wang <lianux.mm@gmail.com> wrote:
> Resend of v2 with the RFC tag restored (v1 was RFC PATCH, so v2 should
> be RFC PATCH v2).
Somehow you sent this twice. Maybe your email setup issue? You also replied
same message twice to my previous comment. Anyway, I will review the later
posted one: https://lore.kernel.org/20260702094633.75658-1-lianux.mm@gmail.com
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
[not found] ` <akY6ZzPnwk-CToMp@lucifer>
@ 2026-07-02 16:52 ` SJ Park
2026-07-03 19:04 ` SJ Park
2026-07-03 1:10 ` wang lian
2026-07-03 1:35 ` wang lian
2 siblings, 1 reply; 11+ messages in thread
From: SJ Park @ 2026-07-02 16:52 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: SJ Park, Lian Wang, damon, linux-mm, daichaobing, kunwu.chan,
Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, linux-kernel
On Thu, 2 Jul 2026 11:23:55 +0100 Lorenzo Stoakes <ljs@kernel.org> wrote:
> +cc all those you missed.
Thank you for doing this, Lorenzo.
>
> I really need to write a bot to do this, because I'm getting a little tired of
> pointing this out :))
Good idea. I will also consider implementing this kind of checks to to my lzy
tool box [1] or hkml [2].
>
> On Thu, Jul 02, 2026 at 05:46:28PM +0800, Lian Wang wrote:
> > include/linux/damon.h | 10 +++
> > include/linux/khugepaged.h | 9 +++
> > mm/damon/core.c | 2 +
> > mm/damon/sysfs-schemes.c | 77 ++++++++++++++++++++++
> > mm/damon/vaddr.c | 128 +++++++++++++++++++++++++++++++++++++
> > mm/khugepaged.c | 46 +++++++++++++
> > 6 files changed, 272 insertions(+)
>
> You are doing damon changes, and that belongs to SJ, sure.
>
> But you're also changing core THP code? Please ensure you cc- THP people because
> without our approval this cannot be merged:
Thank you for calling out this, Lorenzo.
Lian, please do as Lorenzo kindly asked, from the next revision. You don't
need to add those recipients to all the patches if you worry their inbox
volumes. But do ensure adding them to at least patches that modifies
khugepaged.h and khugepaged.c, and the cover letter.
If it is cumbersome, consider using 'hkml patch format' [3]. It does that (run
get_maintainer.pl and add recipients to each patch and the coverletter) for its
users.
[1] https://github.com/sjp38/lazybox
[2] https://github.com/sjp38/hackermail
[3] https://github.com/sjp38/hackermail/blob/master/USAGE.md#formatting-patches
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
[not found] <20260702094633.75658-1-lianux.mm@gmail.com>
@ 2026-07-02 18:35 ` SJ Park
2026-07-02 20:50 ` SJ Park
[not found] ` <20260702094633.75658-3-lianux.mm@gmail.com>
[not found] ` <akY6ZzPnwk-CToMp@lucifer>
2 siblings, 1 reply; 11+ messages in thread
From: SJ Park @ 2026-07-02 18:35 UTC (permalink / raw)
To: Lian Wang; +Cc: SJ Park, damon, linux-mm, daichaobing, kunwu.chan
On Thu, 2 Jul 2026 17:46:28 +0800 Lian Wang <lianux.mm@gmail.com> wrote:
> Resend of v2 with the RFC tag restored (v1 was RFC PATCH, so v2 should
> be RFC PATCH v2).
>
> This resend also includes fixes for issues identified during review of
> the earlier mis-sent PATCH v2 thread: uninitialized memory, TOCTOU
> races, BUILD_BUG guards, missing sysfs action name registration, and
> stack allocation overflow. The series has been re-tested on aarch64
> (anonymous and file-backed THP split) and is checkpatch clean.
>
> v1: https://lore.kernel.org/linux-mm/20260618094838.32805-1-lianux.mm@gmail.com/
Let's call it 'RFC v1'.
>
> Changes since v1
Ditto.
>
> - Rename DAMOS_MTHP_SPLIT -> DAMOS_SPLIT for naming consistency with
> the existing actions (per SJ's review).
> - Drop the per-scheme hot_threshold field. Hotness policy does not
> belong in the kernel; target selection now lives in user space and
> is expressed to DAMOS via the address filter (per SJ's review).
> - Drop the v1 SPE debugfs patch entirely. debugfs is not the right
> interface for a feature, and the SPE profiler belongs in user space
> (see "User-space target selection" below). v2 is kernel mechanism
> only: 5 patches.
> - Decouple T1 (a lab observation) from T2 (the production issue), and
> correct the architecture claim: ptep_test_and_clear_young() skips
> the TLB flush on both x86_64 and arm64, so the blind spot is
> architecture-independent rather than arm64-only.
> - Terminology: avoid "stale TLB". A valid TLB entry is doing its
> job; the point is only that it lets the CPU satisfy a translation
> without a page-table walk, so the Accessed bit cleared by DAMON is
> not re-set.
Thank you for detailed changelog. This is helpful for reviewers.
>
> Background
>
> Two effects degrade DAMON's PTE-Accessed-bit (AF) signal once THP is
> in play. Both are described here as motivation only; this series does
> not change the AF monitoring path.
>
> T2 -- PMD-granularity inflation (production issue)
I think it is better to call this T1, for readers.
>
> A 2MB THP is tracked by a single PMD-level Accessed bit. One access
> to any 4KB sub-page sets the AF for the whole 2MB, so DAMON reports
> the entire THP as hot and cannot distinguish a genuinely hot 2MB
> region from a 2MB region with a single hot 4KB page. Cold memory
> hides inside "hot" THPs, and access-driven pageout/migration becomes
> coarse.
>
> This is the workload that drove the work: Sangfor's Kunpeng 920 KVM
> hosts running Oracle. ARM SPE sampling of that workload shows 94.6%
> of THPs have fewer than 10% of their sub-pages actually accessed.
Cool finding, thank you for sharing. What DB workloads were running there?
Real production workload? Or, synthetic benchmarks?
On the first read, I was wondering how you did ARM SPE sampling. After reading
this mail to the end, I now understand you use perf. Briefly mentioning that
here would be nice. E.g., "ARM SPE sampling of that worklaod using perf shows
..."
>
> T1 -- TLB-reach blind spot (lab observation)
I think it is better to call this T2, for readers.
>
> When the working set fits within L2 TLB reach (measured at 2048
> entries x 2MB = 4GB on Kunpeng 920; no public data available), the
> CPU satisfies translations entirely from the TLB,
> preventing translation table walks. Because
> ptep_test_and_clear_young() does not flush
Wrapping text for the max columns is nice. But let's not wrap it early when
there are spaces. That could reduce space, and even carbon emissions from
people who want to read this nice cover letter after printing out on a paper.
> the TLB, valid TLB entries continue to satisfy translations and the
> AF that DAMON cleared is never re-set, so DAMON sees nr_accesses=0 for
> memory that is in fact hot, and no scheme triggers. This reproduces
> in the lab with small workloads; it is not something we have seen
> reported from production, where working sets exceed TLB reach.
>
> What this series adds
>
> Rather than change AF monitoring, this series adds two order-aware
> DAMOS actions so a policy layer can act at mTHP granularity:
The background explained rooms to improve in DAMON's THP access "monitoring".
And this patch series is proposing adding new DAMOS actions for THP "handling".
Those are two unrelated things.
I really appreciate sharing your findings with the background, but as those are
not related to the proposal, I think it is better to be shared in a different
way.
I understand you are proposing this change because you know DAMON's hugepages
monitoring is imperfect, but still useful enough to get some benefits. If
there were some findings that made you to think so, that could be good
background.
Also, you may have a reason to believe it is a good idea to use larger mTHP for
hot pages, and smaller mTHP for cold pages. If so, and the description of the
reason is not trivial, that could be good materials to add on background.
>
> - DAMOS_COLLAPSE + target_order (patches 1-3): collapse small folios
This reads like you are introducing a new DAMOS action. You indeed mentioned
"this series adds two order-aware DAMOS actions". That's not completely wrong
in a sense, but more technically speaking you are adding a new mode of
DAMOS_COLLAPSE. I'd recommend rephrasing to "extend" DAMOS_COLLAPSE..
> up to a chosen mTHP order. Patch 1 adds the target_order field and
> its sysfs file; patch 2 exports a khugepaged helper
> (damon_collapse_folio_range());
So patch 2 modifies khugepaged? As Lorenzo mentioned on the other reply, that
change should also be reviewed by THP developers on MAINTAINERS file. Please
ensure adding THP developers to the recipients list of the patch and this cover
letter.
The patch adds damon_collapse_folio_range() to khugepaged.h. I understand
DAMON is the only user for now, and therefore you are adding damon_ prefix to
the name. Not necesasrily DAMON is the only user forever. And having damon_
prefix in a land outside of DAMON feels weird. To be consistent with other
functions like collapse_pte_mapped_thp(), I'd suggest dropping the prefix from
the name.
> patch 3 wires the vaddr handler.
>
> - DAMOS_SPLIT + target_order (patches 4-5): split large folios down
> to a chosen mTHP order via split_folio_to_order(), for both
> anonymous and file-backed (tmpfs/shmem) folios.
>
> The two are complementary, not competing:
>
> THP=never + DAMOS_COLLAPSE: start at 4KB, grow hot regions up.
> THP=always + DAMOS_SPLIT: start at 2MB, shrink cold regions down.
>
> This dual-path design aligns with ideas discussed with Asier
> Gutierrez; we plan to unify our mTHP automation and evaluation
> roadmaps under this standard DAMOS_SPLIT action.
>
> A deployment can pick either baseline, or run both, and let DAMOS
> manage the placement. THP is still wanted for the hot working set
> (fewer TLB misses, shallower walks); the goal is not "no THP" but
> "THP where it is hot, small pages where it is cold."
I think this is a good idea. Could you further elaborate what benefit users
can get from this in more detail, though? Off the top of my head, I can expect
the benefits would be 1) less TLB miss from hot data, and 2) less mTHP
allocation failures from cold data occupying phsically contiguous memory. But
you might showing even more benefits. Anyway I think those are better to be
widely known by our kernel users. Some of those may better to be put on the
background section.
>
> User-space target selection
>
> The decision of *which* regions to collapse or split is left to user
> space and fed to DAMOS through the existing DAMOS address filter
> (DAMOS_FILTER_TYPE_ADDR) -- the interface suggested during v1 review.
> The kernel provides the mechanism; user space provides the policy,
> consistent with the perf/BPF "kernel samples, user space decides"
> model and with the DAMON-X direction.
>
> Because the AF signal is unreliable at PMD granularity (T1/T2), the
> scheme is run with min_nr_accesses=0 so it does not gate on access
> count, and the address filter selects targets. min_nr_accesses=0 is
> also what unblocks the T1 case, where nr_accesses is pinned at 0.
Oh, so you are saying DAMON's huge pages monitoring is too problematic to use
as-is, for your use case. That's completely fair. And that explains what you
really want to do. But this whole pictur is better to be described earlier
than your changes proposal.
From the beginning, explain why using larger mTHP for hot pages and smaller
mTHP for cold pages are good idea. After that, explain how DAMON can be
extended for doing that. Then, you can further explain your T1 and T2 findings
that explain why DAMON-only appraoch is not feasible, and how user-space target
selection can overcome it.
Also, I understand DAMON-only approach is not optimum or just useless for your
aimed use case. But, is it completely useless for every possible use case? I
think it might still provide some benefit in some use cases. Could you pleae
clarify this point more in detail? If you have data showing how useless
DAMON-only appraoch is, and how user space approach improves, it would be
awesome.
>
> Why not just turn khugepaged off? You can, but khugepaged is global
> and usually left enabled because other workloads rely on it; it cannot
> be disabled per region. DAMOS_COLLAPSE gives per-region,
> access-pattern-driven collapse -- a more precise, targeted complement
> to khugepaged's global scan, not a replacement for it. To handle the
> runtime race where khugepaged might aggressively re-collapse what
> DAMOS_SPLIT just split, we are evaluating a precise VMA-level handshake
> or back-off mechanism to prevent ping-pong effects in mixed
> environments.
Good reasoning. However, khugepaged can be turned off per process, using
prctl(). How about turning khugepaged off for the process you want to use
DAMOS_COLLAPSE/SPLIT for?
>
> Two user-space data sources produce the candidate address ranges:
>
> 1. ARM SPE (ARMv8.2+): perf record (SPE) -> per-2MB hot-fraction
> histogram -> PA->VA via /proc/<pid>/pagemap -> sparse-THP VA
> ranges. SPE reads physical addresses from the CPU pipeline,
> bypassing the TLB and page tables, so it is immune to T1 and T2.
>
> 2. smaps fallback (no SPE): scan /proc/<pid>/smaps for THP-backed
> VMAs and treat the 2MB-aligned ranges as split candidates.
>
> The SPE profiler stays in user space deliberately: the SPE PMU is a
> single-consumer resource, so a kernel consumer would lock out
> user-space perf and tooling (x86 PEBS / AMD IBS have the same
> property). Keeping it in user space avoids that and keeps the metric
> source pluggable, in line with DAMON-X.
Maybe you are mentioning the perf events based DAMON, not DAMON-X.
And I understand you plan to extend DAMON to use ARM SPE, on top of the perf
events based DAMON as a future work. As I mentioned before, I think that makes
perfect sense and I'm aligned. Maybe this paragraph can bit reworded to make
it more clear, though.
> This is why v2 drops the v1
> SPE debugfs patch.
>
> Testing
>
> Tested on aarch64 with this series applied to 7.1.0-rc5, THP=always,
> using a DAMOS_SPLIT scheme (target_order=2, min_nr_accesses=0) and a
> single DAMOS address filter selecting one 2MB-aligned range:
>
> - Anonymous THP: the filter splits exactly that one THP --
> sz_applied=2MB and AnonHugePages drops by 2MB, the rest of the
> 256MB mapping untouched.
> - File-backed THP (tmpfs/shmem mounted huge=always): the same setup
> splits exactly one 2MB shmem THP -- sz_applied=2MB and
> ShmemPmdMapped drops by 2MB. This confirms split_folio_to_order()
> works for shmem folios (the KVM-guest-on-THP-tmpfs case).
> - The address filter is what bounds the action: sz_tried covers the
> whole ~2GB monitored region while sz_applied is exactly the 2MB the
> filter selected.
> - A smaps-based path (for hosts without SPE) enumerates THP-backed
> ranges and splits all THP in the target workload.
> - checkpatch clean on all 5 patches.
So, you tested only split part, for functionality. Do you have plans to
further test collapse part, and performance?
>
> Test scripts and SPE-to-DAMON pipeline tools:
> https://github.com/lianux-mm/damon_spe/tree/v2
Thank you for sharing the code!
So, I find rooms to improve on this cover letter for the readability and
clarity of the idea. But as I mentioned before, I like the overall idea of
this series.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers
[not found] ` <akZDj6s8U5_Mnetv@lucifer>
@ 2026-07-02 19:43 ` SJ Park
2026-07-02 20:32 ` SJ Park
2026-07-03 2:06 ` wang lian
1 sibling, 1 reply; 11+ messages in thread
From: SJ Park @ 2026-07-02 19:43 UTC (permalink / raw)
To: Lian Wang
Cc: SJ Park, Lorenzo Stoakes, damon, linux-mm, daichaobing,
kunwu.chan, Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, linux-kernel
On Thu, 2 Jul 2026 12:07:01 +0100 Lorenzo Stoakes <ljs@kernel.org> wrote:
> (+cc missing people again)
Thank you for adding the recipients, and review, Lorenzo!
>
> Sorry but we're not going to accept anything that exports THP logic like this at
> all.
>
> And a damon wrapper in core mm code is just a non-starter, so you really need to
> rethink your approach.
>
> I think SJ already commented on this in your v1 from what I can see? I'd listen
> to his advice on this :)
Lorenzo is right. Not disrupting the world outside of mm/damon/ is the first
principle of DAMON development. Sometimes we may have to make some changes
outside of mm/damon/, but we MUST make it not disruptive, small, and perfectly
aligned with the developers of the area with respects.
>
> On Thu, Jul 02, 2026 at 05:46:30PM +0800, Lian Wang wrote:
> > Export a thin wrapper around collapse_huge_page() that allows external
> > subsystems such as DAMON to trigger THP collapse on a target address
> > range.
> >
> > Currently restricted to PMD order (HPAGE_PMD_ORDER), since
> > collapse_huge_page() does not yet support arbitrary mTHP orders.
> > The restriction can be relaxed when khugepaged gains mTHP support.
> >
> > The caller must hold a reference to @mm. Do not hold mmap lock:
> > collapse_huge_page() acquires mmap_read_lock for validation, releases
> > it, then acquires mmap_write_lock for the actual collapse. Holding
> > an outer mmap_read_lock would cause a self-deadlock when the same
> > thread attempts the inner mmap_write_lock.
> >
> > Co-developed-by: Kunwu Chan <kunwu.chan@gmail.com>
> > Signed-off-by: Kunwu Chan <kunwu.chan@gmail.com>
> > Signed-off-by: Lian Wang <lianux.mm@gmail.com>
> > Signed-off-by: Lian Wang <lianux.wang@processmission.com>
>
> This patch is exporting internal mm logic without proper safeguards so it's just
> not something we're going to accept, sorry.
>
> (Also not sure it's correct to have multiple S-o-b for the same person (unless
> re-tagging a backport or something)? I'm not sure though)
I agree to Lorenzo.
As I replied [1] to patch 1, if you just want to give a credit to your
employer, you could add your employer name on the single tag, like "Lian Wang
(processssmission) <lianux.mm@gmail.com>".
If it is for ensure the replies goes to not only your private inbox but also
corporate email inbox, you can simply Cc: your corporate email address.
>
> > ---
> > include/linux/khugepaged.h | 9 ++++++++
> > mm/khugepaged.c | 46 ++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 55 insertions(+)
> >
> > diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> > index d7a9053ff4fe..f7d49cba712f 100644
> > --- a/include/linux/khugepaged.h
> > +++ b/include/linux/khugepaged.h
> > @@ -20,6 +20,9 @@ extern bool current_is_khugepaged(void);
> > void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
> > bool install_pmd);
> >
> > +int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
> > + unsigned int target_order);
>
> No thanks. We're not putting damon wrappers into core code. This is breaking the
> abstraction and letting arbitrary users invoke internal mm logic.
I agree. First of all, having something with 'damon_' prefix in khugepaged.h
makes no sense. If we have to expose something, maybe mm/internal.h could be
considered in my opinion. But for only DAMON usage, include/linux/khugepaged.h
is definitely wrong.
>
> Plus you're literally exporting this so it can be abused by drivers.
I was raising my eyebrows here. I show this patch adds EXPORT_SYMBOL_GPL() at
the end. Lorenzo is completely right. I believe Lian added that only by a
mistake.
>
> > +
> > static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> > {
> > if (mm_flags_test(MMF_VM_HUGEPAGE, oldmm))
> > @@ -47,6 +50,12 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
> > {
> > }
> >
> > +static inline int damon_collapse_folio_range(struct mm_struct *mm,
> > + unsigned long start_addr, unsigned int target_order)
> > +{
> > + return -EINVAL;
> > +}
> > +
> > static inline void khugepaged_min_free_kbytes_update(void)
> > {
> > }
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 617bca76db49..7fe9ce1e0533 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -3272,3 +3272,49 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> > return thps == ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0
> > : madvise_collapse_errno(last_fail);
> > }
> > +
> > +/**
> > + * damon_collapse_folio_range() - Collapse base pages in range into a THP
> > + * @mm: mm_struct of the target process
> > + * @start_addr: start address (must be order-aligned)
> > + * @target_order: page order of the collapse result (currently only
> > + * HPAGE_PMD_ORDER is supported)
> > + *
> > + * Thin wrapper around collapse_huge_page() for external callers such as
> > + * DAMON. The caller must hold a reference to @mm. Do not hold mmap
>
> This is really fragile and bug bait.
>
> > + * lock: collapse_huge_page() acquires mmap_read_lock for validation,
>
> This is just gross, you're now collapsing based on an outdated concept of
> what the current VMA state is...
>
> You're also losing literally everything that madvise_collapse() does.
>
> AND you're overriding THP limitations like max_ptes_none, which is horrible...
>
> > + * releases it, then acquires mmap_write_lock for the collapse. Holding
> > + * an outer mmap_read_lock would self-deadlock.
>
> This is a sign the interface is wrong!
>
> > + *
> > + * Return: 0 on success, -EINVAL on bad arguments, negative error from
> > + * madvise_collapse_errno() otherwise.
> > + */
> > +int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
> > + unsigned int target_order)
> > +{
> > + struct collapse_control *cc;
> > + enum scan_result result;
> > +
> > + if (target_order != HPAGE_PMD_ORDER) {
> > + pr_warn_once("%s: only PMD order (%u) is supported, got %u\n",
> > + __func__, HPAGE_PMD_ORDER, target_order);
> > + return -EINVAL;
> > + }
> > + if (start_addr & ((PAGE_SIZE << target_order) - 1))
> > + return -EINVAL;
> > +
> > + cc = kmalloc_obj(*cc);
> > + if (!cc)
> > + return -ENOMEM;
> > + cc->is_khugepaged = false;
> > + cc->progress = 0;
> > +
> > + lru_add_drain_all();
>
> This is quite literally a copy/paste from madvise_collapse(). No no no :) code
> duplication like this is also unacceptable.
Thank you for kind review, Lorenzo.
I believe Lorenzo's concerns are valid and should all be addressed before
dropping RFC tag of this patch. Also, if you "have to" post the next revision
before addressing those to get reviews of changes on other parts of this
series, let's make sure the fact is very clearly described. In the way, we
can avoid wasting time of reviewers. For example, you could add a note like
below at the beginning of the next revision of this patch.
"""
NOTE to THP reviewers: This patch is not yet addressing problems that are found
on the previous revision. This version is posted for review of only other
parts in the series. Please ignore this patch for now.
"""
>
> > +
> > + result = collapse_huge_page(mm, start_addr, 1, 0, cc, target_order);
> > + kfree(cc);
> > + if (result == SCAN_SUCCEED || result == SCAN_PMD_MAPPED)
> > + return 0;
> > + return madvise_collapse_errno(result);
> > +}
> > +EXPORT_SYMBOL_GPL(damon_collapse_folio_range);
>
> We _definitely_ cannot have internal mm logic _exported_.
I completely agree. Please drop this from the next version.
>
> Yeah sorry you need to rethink this.
I wonder if we could extend madvise_collapse() for mTHP and use it.
I initially thought this is a silly idea, but
'Documentation/admin-guide/mm/transhuge.rst' is saying "currently",
madvise_collapse only supports collapsing to PMD-sized THPs and does not attemp
mTHP collapse. It reads to me like supporting mTHP from MADV_COLLAPSE is a
possible TODO. And it makes sense to me. Users who do MADV_COLLAPSE may want
it to at least have a best-effort mTHP. The interface of madvise_collapse() is
already receiving arbitrary address range, so fit with DAMON's purpose.
If supporting mTHP with MADV_THP is not a real TODO but just a silly idea that
better not to be implemented, I wonder if we can split out the core of
madvise_collapse(), extend it for mTHP, and expose it on mm/internal.h.
I concern if the above idea makes call depth unnecessarily deep, though. A
shallower approach would be renaming madvise_collapse() to somewhat general (I
fail at making an example, but whatever without maddvise_ prefix) and extending
it for mTHP support.
Off the topic of this patch, but I find 'madvise_collapse()' is declared in
include/linux/huge_mm.h, while it is being used only inside mm/. Would it make
sense to move the declaration to mm/internal.h?
[1] https://lore.kernel.org/20260702185137.91227-1-sj@kernel.org
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers
2026-07-02 19:43 ` [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers SJ Park
@ 2026-07-02 20:32 ` SJ Park
0 siblings, 0 replies; 11+ messages in thread
From: SJ Park @ 2026-07-02 20:32 UTC (permalink / raw)
To: SJ Park
Cc: Lian Wang, Lorenzo Stoakes, damon, linux-mm, daichaobing,
kunwu.chan, Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, linux-kernel
On Thu, 02 Jul 2026 12:43:59 -0700 SJ Park <sj@kernel.org> wrote:
> On Thu, 2 Jul 2026 12:07:01 +0100 Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> > (+cc missing people again)
>
> Thank you for adding the recipients, and review, Lorenzo!
>
> >
> > Sorry but we're not going to accept anything that exports THP logic like this at
> > all.
> >
> > And a damon wrapper in core mm code is just a non-starter, so you really need to
> > rethink your approach.
> >
> > I think SJ already commented on this in your v1 from what I can see? I'd listen
> > to his advice on this :)
>
> Lorenzo is right. Not disrupting the world outside of mm/damon/ is the first
> principle of DAMON development. Sometimes we may have to make some changes
> outside of mm/damon/, but we MUST make it not disruptive, small, and perfectly
> aligned with the developers of the area with respects.
The best option is just not doing this. And that might be the case.
we already have pmd level DAMOS_COLLAPSE. I find mTHP-supporting DAMOS_SPLIT
can be implemented without any changes on mm/damon/ external world. If it is
true and there is no objection at doing that, mTHP collapse may not really
necessary. That is, the users could collapse in pmd level first, and then
split in desired mTHP level to accomplish their goal. I think that works for
common use cases, too.
It would be suboptimal to collapse in pmd level first and then split. But the
efficiency is unclear. I don't want to disrupt others for unclear gains. We
can upstream split part first, measure the efficiency, and revisit mTHP
collapse if it turns out to be really needed.
What do you think, Lian?
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
2026-07-02 18:35 ` [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
@ 2026-07-02 20:50 ` SJ Park
0 siblings, 0 replies; 11+ messages in thread
From: SJ Park @ 2026-07-02 20:50 UTC (permalink / raw)
To: SJ Park; +Cc: Lian Wang, damon, linux-mm, daichaobing, kunwu.chan
Keeping full original mail, so that Lian can answer all comments in one reply.
On Thu, 2 Jul 2026 11:35:51 -0700 SJ Park <sj@kernel.org> wrote:
> On Thu, 2 Jul 2026 17:46:28 +0800 Lian Wang <lianux.mm@gmail.com> wrote:
>
> > Resend of v2 with the RFC tag restored (v1 was RFC PATCH, so v2 should
> > be RFC PATCH v2).
> >
> > This resend also includes fixes for issues identified during review of
> > the earlier mis-sent PATCH v2 thread: uninitialized memory, TOCTOU
> > races, BUILD_BUG guards, missing sysfs action name registration, and
> > stack allocation overflow. The series has been re-tested on aarch64
> > (anonymous and file-backed THP split) and is checkpatch clean.
> >
> > v1: https://lore.kernel.org/linux-mm/20260618094838.32805-1-lianux.mm@gmail.com/
>
> Let's call it 'RFC v1'.
>
> >
> > Changes since v1
>
> Ditto.
>
> >
> > - Rename DAMOS_MTHP_SPLIT -> DAMOS_SPLIT for naming consistency with
> > the existing actions (per SJ's review).
> > - Drop the per-scheme hot_threshold field. Hotness policy does not
> > belong in the kernel; target selection now lives in user space and
> > is expressed to DAMOS via the address filter (per SJ's review).
> > - Drop the v1 SPE debugfs patch entirely. debugfs is not the right
> > interface for a feature, and the SPE profiler belongs in user space
> > (see "User-space target selection" below). v2 is kernel mechanism
> > only: 5 patches.
> > - Decouple T1 (a lab observation) from T2 (the production issue), and
> > correct the architecture claim: ptep_test_and_clear_young() skips
> > the TLB flush on both x86_64 and arm64, so the blind spot is
> > architecture-independent rather than arm64-only.
> > - Terminology: avoid "stale TLB". A valid TLB entry is doing its
> > job; the point is only that it lets the CPU satisfy a translation
> > without a page-table walk, so the Accessed bit cleared by DAMON is
> > not re-set.
>
> Thank you for detailed changelog. This is helpful for reviewers.
> >
> > Background
> >
> > Two effects degrade DAMON's PTE-Accessed-bit (AF) signal once THP is
> > in play. Both are described here as motivation only; this series does
> > not change the AF monitoring path.
> >
> > T2 -- PMD-granularity inflation (production issue)
>
> I think it is better to call this T1, for readers.
>
> >
> > A 2MB THP is tracked by a single PMD-level Accessed bit. One access
> > to any 4KB sub-page sets the AF for the whole 2MB, so DAMON reports
> > the entire THP as hot and cannot distinguish a genuinely hot 2MB
> > region from a 2MB region with a single hot 4KB page. Cold memory
> > hides inside "hot" THPs, and access-driven pageout/migration becomes
> > coarse.
> >
> > This is the workload that drove the work: Sangfor's Kunpeng 920 KVM
> > hosts running Oracle. ARM SPE sampling of that workload shows 94.6%
> > of THPs have fewer than 10% of their sub-pages actually accessed.
>
> Cool finding, thank you for sharing. What DB workloads were running there?
> Real production workload? Or, synthetic benchmarks?
>
> On the first read, I was wondering how you did ARM SPE sampling. After reading
> this mail to the end, I now understand you use perf. Briefly mentioning that
> here would be nice. E.g., "ARM SPE sampling of that worklaod using perf shows
> ..."
>
> >
> > T1 -- TLB-reach blind spot (lab observation)
>
> I think it is better to call this T2, for readers.
>
> >
> > When the working set fits within L2 TLB reach (measured at 2048
> > entries x 2MB = 4GB on Kunpeng 920; no public data available), the
> > CPU satisfies translations entirely from the TLB,
> > preventing translation table walks. Because
> > ptep_test_and_clear_young() does not flush
>
> Wrapping text for the max columns is nice. But let's not wrap it early when
> there are spaces. That could reduce space, and even carbon emissions from
> people who want to read this nice cover letter after printing out on a paper.
>
> > the TLB, valid TLB entries continue to satisfy translations and the
> > AF that DAMON cleared is never re-set, so DAMON sees nr_accesses=0 for
> > memory that is in fact hot, and no scheme triggers. This reproduces
> > in the lab with small workloads; it is not something we have seen
> > reported from production, where working sets exceed TLB reach.
> >
> > What this series adds
> >
> > Rather than change AF monitoring, this series adds two order-aware
> > DAMOS actions so a policy layer can act at mTHP granularity:
>
> The background explained rooms to improve in DAMON's THP access "monitoring".
> And this patch series is proposing adding new DAMOS actions for THP "handling".
> Those are two unrelated things.
>
> I really appreciate sharing your findings with the background, but as those are
> not related to the proposal, I think it is better to be shared in a different
> way.
>
> I understand you are proposing this change because you know DAMON's hugepages
> monitoring is imperfect, but still useful enough to get some benefits. If
> there were some findings that made you to think so, that could be good
> background.
>
> Also, you may have a reason to believe it is a good idea to use larger mTHP for
> hot pages, and smaller mTHP for cold pages. If so, and the description of the
> reason is not trivial, that could be good materials to add on background.
Now I doubt if we really need two new DAMOS actions. What happens if user asks
DAMOS_COLLAPSE of a target order for region that currently being backed by an
mTHP of an order that is larger than the newly asked one? If we just ignore
the case, DAMOS_SPLIT will really nneeded. But maybe we can just split the
large folio into the newly requested order mTHPs. In this scenario,
DAMOS_SPLIT is not needed?
>
> >
> > - DAMOS_COLLAPSE + target_order (patches 1-3): collapse small folios
>
> This reads like you are introducing a new DAMOS action. You indeed mentioned
> "this series adds two order-aware DAMOS actions". That's not completely wrong
> in a sense, but more technically speaking you are adding a new mode of
> DAMOS_COLLAPSE. I'd recommend rephrasing to "extend" DAMOS_COLLAPSE..
>
> > up to a chosen mTHP order. Patch 1 adds the target_order field and
> > its sysfs file; patch 2 exports a khugepaged helper
> > (damon_collapse_folio_range());
>
> So patch 2 modifies khugepaged? As Lorenzo mentioned on the other reply, that
> change should also be reviewed by THP developers on MAINTAINERS file. Please
> ensure adding THP developers to the recipients list of the patch and this cover
> letter.
>
> The patch adds damon_collapse_folio_range() to khugepaged.h. I understand
> DAMON is the only user for now, and therefore you are adding damon_ prefix to
> the name. Not necesasrily DAMON is the only user forever. And having damon_
> prefix in a land outside of DAMON feels weird. To be consistent with other
> functions like collapse_pte_mapped_thp(), I'd suggest dropping the prefix from
> the name.
>
> > patch 3 wires the vaddr handler.
> >
> > - DAMOS_SPLIT + target_order (patches 4-5): split large folios down
> > to a chosen mTHP order via split_folio_to_order(), for both
> > anonymous and file-backed (tmpfs/shmem) folios.
> >
> > The two are complementary, not competing:
> >
> > THP=never + DAMOS_COLLAPSE: start at 4KB, grow hot regions up.
> > THP=always + DAMOS_SPLIT: start at 2MB, shrink cold regions down.
> >
> > This dual-path design aligns with ideas discussed with Asier
> > Gutierrez; we plan to unify our mTHP automation and evaluation
> > roadmaps under this standard DAMOS_SPLIT action.
> >
> > A deployment can pick either baseline, or run both, and let DAMOS
> > manage the placement. THP is still wanted for the hot working set
> > (fewer TLB misses, shallower walks); the goal is not "no THP" but
> > "THP where it is hot, small pages where it is cold."
>
> I think this is a good idea. Could you further elaborate what benefit users
> can get from this in more detail, though? Off the top of my head, I can expect
> the benefits would be 1) less TLB miss from hot data, and 2) less mTHP
> allocation failures from cold data occupying phsically contiguous memory. But
> you might showing even more benefits. Anyway I think those are better to be
> widely known by our kernel users. Some of those may better to be put on the
> background section.
>
> >
> > User-space target selection
> >
> > The decision of *which* regions to collapse or split is left to user
> > space and fed to DAMOS through the existing DAMOS address filter
> > (DAMOS_FILTER_TYPE_ADDR) -- the interface suggested during v1 review.
> > The kernel provides the mechanism; user space provides the policy,
> > consistent with the perf/BPF "kernel samples, user space decides"
> > model and with the DAMON-X direction.
> >
> > Because the AF signal is unreliable at PMD granularity (T1/T2), the
> > scheme is run with min_nr_accesses=0 so it does not gate on access
> > count, and the address filter selects targets. min_nr_accesses=0 is
> > also what unblocks the T1 case, where nr_accesses is pinned at 0.
>
> Oh, so you are saying DAMON's huge pages monitoring is too problematic to use
> as-is, for your use case. That's completely fair. And that explains what you
> really want to do. But this whole pictur is better to be described earlier
> than your changes proposal.
>
> From the beginning, explain why using larger mTHP for hot pages and smaller
> mTHP for cold pages are good idea. After that, explain how DAMON can be
> extended for doing that. Then, you can further explain your T1 and T2 findings
> that explain why DAMON-only appraoch is not feasible, and how user-space target
> selection can overcome it.
>
> Also, I understand DAMON-only approach is not optimum or just useless for your
> aimed use case. But, is it completely useless for every possible use case? I
> think it might still provide some benefit in some use cases. Could you pleae
> clarify this point more in detail? If you have data showing how useless
> DAMON-only appraoch is, and how user space approach improves, it would be
> awesome.
>
> >
> > Why not just turn khugepaged off? You can, but khugepaged is global
> > and usually left enabled because other workloads rely on it; it cannot
> > be disabled per region. DAMOS_COLLAPSE gives per-region,
> > access-pattern-driven collapse -- a more precise, targeted complement
> > to khugepaged's global scan, not a replacement for it. To handle the
> > runtime race where khugepaged might aggressively re-collapse what
> > DAMOS_SPLIT just split, we are evaluating a precise VMA-level handshake
> > or back-off mechanism to prevent ping-pong effects in mixed
> > environments.
>
> Good reasoning. However, khugepaged can be turned off per process, using
> prctl(). How about turning khugepaged off for the process you want to use
> DAMOS_COLLAPSE/SPLIT for?
>
> >
> > Two user-space data sources produce the candidate address ranges:
> >
> > 1. ARM SPE (ARMv8.2+): perf record (SPE) -> per-2MB hot-fraction
> > histogram -> PA->VA via /proc/<pid>/pagemap -> sparse-THP VA
> > ranges. SPE reads physical addresses from the CPU pipeline,
> > bypassing the TLB and page tables, so it is immune to T1 and T2.
> >
> > 2. smaps fallback (no SPE): scan /proc/<pid>/smaps for THP-backed
> > VMAs and treat the 2MB-aligned ranges as split candidates.
> >
> > The SPE profiler stays in user space deliberately: the SPE PMU is a
> > single-consumer resource, so a kernel consumer would lock out
> > user-space perf and tooling (x86 PEBS / AMD IBS have the same
> > property). Keeping it in user space avoids that and keeps the metric
> > source pluggable, in line with DAMON-X.
>
> Maybe you are mentioning the perf events based DAMON, not DAMON-X.
>
> And I understand you plan to extend DAMON to use ARM SPE, on top of the perf
> events based DAMON as a future work. As I mentioned before, I think that makes
> perfect sense and I'm aligned. Maybe this paragraph can bit reworded to make
> it more clear, though.
>
> > This is why v2 drops the v1
> > SPE debugfs patch.
> >
> > Testing
> >
> > Tested on aarch64 with this series applied to 7.1.0-rc5, THP=always,
> > using a DAMOS_SPLIT scheme (target_order=2, min_nr_accesses=0) and a
> > single DAMOS address filter selecting one 2MB-aligned range:
> >
> > - Anonymous THP: the filter splits exactly that one THP --
> > sz_applied=2MB and AnonHugePages drops by 2MB, the rest of the
> > 256MB mapping untouched.
> > - File-backed THP (tmpfs/shmem mounted huge=always): the same setup
> > splits exactly one 2MB shmem THP -- sz_applied=2MB and
> > ShmemPmdMapped drops by 2MB. This confirms split_folio_to_order()
> > works for shmem folios (the KVM-guest-on-THP-tmpfs case).
> > - The address filter is what bounds the action: sz_tried covers the
> > whole ~2GB monitored region while sz_applied is exactly the 2MB the
> > filter selected.
> > - A smaps-based path (for hosts without SPE) enumerates THP-backed
> > ranges and splits all THP in the target workload.
> > - checkpatch clean on all 5 patches.
>
> So, you tested only split part, for functionality. Do you have plans to
> further test collapse part, and performance?
>
> >
> > Test scripts and SPE-to-DAMON pipeline tools:
> > https://github.com/lianux-mm/damon_spe/tree/v2
>
> Thank you for sharing the code!
>
> So, I find rooms to improve on this cover letter for the readability and
> clarity of the idea. But as I mentioned before, I like the overall idea of
> this series.
>
>
> Thanks,
> SJ
>
> [...]
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
[not found] ` <akY6ZzPnwk-CToMp@lucifer>
2026-07-02 16:52 ` [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
@ 2026-07-03 1:10 ` wang lian
2026-07-03 16:37 ` SJ Park
2026-07-03 1:35 ` wang lian
2 siblings, 1 reply; 11+ messages in thread
From: wang lian @ 2026-07-03 1:10 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: sj, damon, linux-mm, daichaobing, kunwu.chan, Andrew Morton,
David Hildenbrand, Zi Yan, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4305 bytes --]
Hi,Lorenzo
> On Jul 2, 2026, at 18:23, Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> +cc all those you missed.
>
> I really need to write a bot to do this, because I'm getting a little tired of
> pointing this out :))
>
> On Thu, Jul 02, 2026 at 05:46:28PM +0800, Lian Wang wrote:
>> include/linux/damon.h | 10 +++
>> include/linux/khugepaged.h | 9 +++
>> mm/damon/core.c | 2 +
>> mm/damon/sysfs-schemes.c | 77 ++++++++++++++++++++++
>> mm/damon/vaddr.c | 128 +++++++++++++++++++++++++++++++++++++
>> mm/khugepaged.c | 46 +++++++++++++
>> 6 files changed, 272 insertions(+)
>
> You are doing damon changes, and that belongs to SJ, sure.
>
> But you're also changing core THP code? Please ensure you cc- THP people because
> without our approval this cannot be merged:
>
> $ scripts/get_maintainer.pl 20260702094633.75658-1-lianux.mm@gmail.com.mbx <mailto:20260702094633.75658-1-lianux.mm@gmail.com.mbx>
> SJ Park <sj@kernel.org <mailto:sj@kernel.org>> (maintainer:DAMON)
> Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> David Hildenbrand <david@kernel.org <mailto:david@kernel.org>> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Lorenzo Stoakes <ljs@kernel.org <mailto:ljs@kernel.org>> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Zi Yan <ziy@nvidia.com <mailto:ziy@nvidia.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Baolin Wang <baolin.wang@linux.alibaba.com <mailto:baolin.wang@linux.alibaba.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> "Liam R. Howlett" <liam@infradead.org <mailto:liam@infradead.org>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Nico Pache <npache@redhat.com <mailto:npache@redhat.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Ryan Roberts <ryan.roberts@arm.com <mailto:ryan.roberts@arm.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Dev Jain <dev.jain@arm.com <mailto:dev.jain@arm.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Barry Song <baohua@kernel.org <mailto:baohua@kernel.org>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Lance Yang <lance.yang@linux.dev <mailto:lance.yang@linux.dev>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> damon@lists.linux.dev <mailto:damon@lists.linux.dev> (open list:DAMON)
> linux-mm@kvack.org <mailto:linux-mm@kvack.org> (open list:DAMON)
> linux-kernel@vger.kernel.org <mailto:linux-kernel@vger.kernel.org> (open list)
>
>>
>> --
>> 2.50.1 (Apple Git-155)
>>
>
Thank you for pointing this out again, and I deeply appreciate all the tedious
work you do as a maintainer to keep the mailing lists aligned.
As a newcomer trying to transition into complex feature development, I must
admit my initial design is far from perfect. I leveraged AI tools to assist
with certain implementation paths, and although I extensively self-reviewed,
tested, and reasoned through every single line of code, your and SJ's
feedback clearly shows there are still major architectural flaws. Just like my
very first kernel patch—which was strictly reviewed and guided by you and David H.
—I know that your sharp feedback is exactly what will help me grow as a kernel engineer.
I will completely step back and radically rethink the design of this series,
especially regarding the cross-subsystem encapsulation and locking hazards.
As for that bot you mentioned getting tired of running manually—let me write
it for you. An imperfect newcomer who just made these exact mistakes knows
precisely where the traps are. I can craft a pre-flight patch check script or
a CI bot that automatically validates the `get_maintainer.pl` output against
the patch file diffstat before any email is dispatched, throwing a loud warning
if core MM files are touched but their respective maintainers are missing from
the CC line.
What do you think? I would love to build this tool to save your and other
maintainers' valuable time in the future.
Thanks again for steering me in the right direction!
Best regards,
Wang Lian
> Thanks, Lorenzo
[-- Attachment #2: Type: text/html, Size: 37873 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
[not found] ` <akY6ZzPnwk-CToMp@lucifer>
2026-07-02 16:52 ` [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
2026-07-03 1:10 ` wang lian
@ 2026-07-03 1:35 ` wang lian
2 siblings, 0 replies; 11+ messages in thread
From: wang lian @ 2026-07-03 1:35 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: sj, damon, linux-mm, daichaobing, kunwu.chan, Andrew Morton,
David Hildenbrand, Zi Yan, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
linux-kernel
Hi Lorenzo,
> On Jul 2, 2026, at 18:23, Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> +cc all those you missed.
>
> I really need to write a bot to do this, because I'm getting a little tired of
> pointing this out :))
>
> On Thu, Jul 02, 2026 at 05:46:28PM +0800, Lian Wang wrote:
>> include/linux/damon.h | 10 +++
>> include/linux/khugepaged.h | 9 +++
>> mm/damon/core.c | 2 +
>> mm/damon/sysfs-schemes.c | 77 ++++++++++++++++++++++
>> mm/damon/vaddr.c | 128 +++++++++++++++++++++++++++++++++++++
>> mm/khugepaged.c | 46 +++++++++++++
>> 6 files changed, 272 insertions(+)
>
> You are doing damon changes, and that belongs to SJ, sure.
>
> But you're also changing core THP code? Please ensure you cc- THP people because
> without our approval this cannot be merged:
>
> $ scripts/get_maintainer.pl 20260702094633.75658-1-lianux.mm@gmail.com.mbx
> SJ Park <sj@kernel.org> (maintainer:DAMON)
> Andrew Morton <akpm@linux-foundation.org> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> David Hildenbrand <david@kernel.org> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Lorenzo Stoakes <ljs@kernel.org> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Zi Yan <ziy@nvidia.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Baolin Wang <baolin.wang@linux.alibaba.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> "Liam R. Howlett" <liam@infradead.org> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Nico Pache <npache@redhat.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Ryan Roberts <ryan.roberts@arm.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Dev Jain <dev.jain@arm.com> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Barry Song <baohua@kernel.org> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> Lance Yang <lance.yang@linux.dev> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> damon@lists.linux.dev (open list:DAMON)
> linux-mm@kvack.org (open list:DAMON)
> linux-kernel@vger.kernel.org (open list)
>
>>
>> --
>> 2.50.1 (Apple Git-155)
>>
>
Thank you for pointing this out again, and I deeply appreciate
all the tedious work you do as a maintainer to keep the mailing
lists aligned.
As a newcomer trying to transition into complex feature
development, I must admit my initial design is far from perfect.
I leveraged AI tools to assist with certain implementation
paths, and although I extensively self-reviewed, tested, and
reasoned through every single line of code, your and SJ's
feedback clearly shows there are still major architectural
flaws. Just like my very first kernel patch -- which was
strictly reviewed and guided by you and David H. -- I know that
your sharp feedback is exactly what will help me grow as a
kernel engineer.
I will completely step back and radically rethink the design of
this series, especially regarding the cross-subsystem
encapsulation and locking hazards.
As for that bot you mentioned getting tired of running manually
-- let me write it for you. An imperfect newcomer who just made
these exact mistakes knows precisely where the traps are. I can
craft a pre-flight patch check script or a CI bot that
automatically validates the get_maintainer.pl output against the
patch file diffstat before any email is dispatched, throwing a
loud warning if core MM files are touched but their respective
maintainers are missing from the CC line.
What do you think? I would love to build this tool to save your
and other maintainers' valuable time in the future.
Thanks again for steering me in the right direction!
Best regards,
Wang Lian
> Thanks, Lorenzo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers
[not found] ` <akZDj6s8U5_Mnetv@lucifer>
2026-07-02 19:43 ` [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers SJ Park
@ 2026-07-03 2:06 ` wang lian
1 sibling, 0 replies; 11+ messages in thread
From: wang lian @ 2026-07-03 2:06 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: sj, damon, linux-mm, daichaobing, kunwu.chan, Andrew Morton,
David Hildenbrand, Zi Yan, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 7297 bytes --]
Hi Lorenzo,
> On Jul 2, 2026, at 19:07, Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> (+cc missing people again)
>
> Sorry but we're not going to accept anything that exports THP logic like this at
> all.
>
> And a damon wrapper in core mm code is just a non-starter, so you really need to
> rethink your approach.
>
> I think SJ already commented on this in your v1 from what I can see? I'd listen
> to his advice on this :)
>
> On Thu, Jul 02, 2026 at 05:46:30PM +0800, Lian Wang wrote:
>> Export a thin wrapper around collapse_huge_page() that allows external
>> subsystems such as DAMON to trigger THP collapse on a target address
>> range.
>>
>> Currently restricted to PMD order (HPAGE_PMD_ORDER), since
>> collapse_huge_page() does not yet support arbitrary mTHP orders.
>> The restriction can be relaxed when khugepaged gains mTHP support.
>>
>> The caller must hold a reference to @mm. Do not hold mmap lock:
>> collapse_huge_page() acquires mmap_read_lock for validation, releases
>> it, then acquires mmap_write_lock for the actual collapse. Holding
>> an outer mmap_read_lock would cause a self-deadlock when the same
>> thread attempts the inner mmap_write_lock.
>>
>> Co-developed-by: Kunwu Chan <kunwu.chan@gmail.com>
>> Signed-off-by: Kunwu Chan <kunwu.chan@gmail.com>
>> Signed-off-by: Lian Wang <lianux.mm@gmail.com>
>> Signed-off-by: Lian Wang <lianux.wang@processmission.com>
>
> This patch is exporting internal mm logic without proper safeguards so it's just
> not something we're going to accept, sorry.
>
> (Also not sure it's correct to have multiple S-o-b for the same person (unless
> re-tagging a backport or something)? I'm not sure though)
>
>> ---
>> include/linux/khugepaged.h | 9 ++++++++
>> mm/khugepaged.c | 46 ++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 55 insertions(+)
>>
>> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
>> index d7a9053ff4fe..f7d49cba712f 100644
>> --- a/include/linux/khugepaged.h
>> +++ b/include/linux/khugepaged.h
>> @@ -20,6 +20,9 @@ extern bool current_is_khugepaged(void);
>> void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
>> bool install_pmd);
>>
>> +int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
>> + unsigned int target_order);
>
> No thanks. We're not putting damon wrappers into core code. This is breaking the
> abstraction and letting arbitrary users invoke internal mm logic.
>
> Plus you're literally exporting this so it can be abused by drivers.
>
>> +
>> static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
>> {
>> if (mm_flags_test(MMF_VM_HUGEPAGE, oldmm))
>> @@ -47,6 +50,12 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
>> {
>> }
>>
>> +static inline int damon_collapse_folio_range(struct mm_struct *mm,
>> + unsigned long start_addr, unsigned int target_order)
>> +{
>> + return -EINVAL;
>> +}
>> +
>> static inline void khugepaged_min_free_kbytes_update(void)
>> {
>> }
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 617bca76db49..7fe9ce1e0533 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -3272,3 +3272,49 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>> return thps == ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0
>> : madvise_collapse_errno(last_fail);
>> }
>> +
>> +/**
>> + * damon_collapse_folio_range() - Collapse base pages in range into a THP
>> + * @mm: mm_struct of the target process
>> + * @start_addr: start address (must be order-aligned)
>> + * @target_order: page order of the collapse result (currently only
>> + * HPAGE_PMD_ORDER is supported)
>> + *
>> + * Thin wrapper around collapse_huge_page() for external callers such as
>> + * DAMON. The caller must hold a reference to @mm. Do not hold mmap
>
> This is really fragile and bug bait.
>
>> + * lock: collapse_huge_page() acquires mmap_read_lock for validation,
>
> This is just gross, you're now collapsing based on an outdated concept of
> what the current VMA state is...
>
> You're also losing literally everything that madvise_collapse() does.
>
> AND you're overriding THP limitations like max_ptes_none, which is horrible...
>
>> + * releases it, then acquires mmap_write_lock for the collapse. Holding
>> + * an outer mmap_read_lock would self-deadlock.
>
> This is a sign the interface is wrong!
>
>> + *
>> + * Return: 0 on success, -EINVAL on bad arguments, negative error from
>> + * madvise_collapse_errno() otherwise.
>> + */
>> +int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
>> + unsigned int target_order)
>> +{
>> + struct collapse_control *cc;
>> + enum scan_result result;
>> +
>> + if (target_order != HPAGE_PMD_ORDER) {
>> + pr_warn_once("%s: only PMD order (%u) is supported, got %u\n",
>> + __func__, HPAGE_PMD_ORDER, target_order);
>> + return -EINVAL;
>> + }
>> + if (start_addr & ((PAGE_SIZE << target_order) - 1))
>> + return -EINVAL;
>> +
>> + cc = kmalloc_obj(*cc);
>> + if (!cc)
>> + return -ENOMEM;
>> + cc->is_khugepaged = false;
>> + cc->progress = 0;
>> +
>> + lru_add_drain_all();
>
> This is quite literally a copy/paste from madvise_collapse(). No no no :) code
> duplication like this is also unacceptable.
>
>> +
>> + result = collapse_huge_page(mm, start_addr, 1, 0, cc, target_order);
>> + kfree(cc);
>> + if (result == SCAN_SUCCEED || result == SCAN_PMD_MAPPED)
>> + return 0;
>> + return madvise_collapse_errno(result);
>> +}
>> +EXPORT_SYMBOL_GPL(damon_collapse_folio_range);
>
> We _definitely_ cannot have internal mm logic _exported_.
>
> Yeah sorry you need to rethink this.
>
Thank you for the incredibly candid and sharp review. You are
completely right, and I apologize for my rashness.
My initial motivation was purely focused on solving a pressing
production issue we encountered, but in my eagerness, I rushed
into the implementation without respecting the structural
boundaries of the core MM code. Bypassing the proper mmap_lock
state checks, duplicating the madvise validation logic, and
blindly exporting internal MM machinery via EXPORT_SYMBOL_GPL
creates fragile bug bait. It was reckless of me, and I feel I
have let down the valuable time and enthusiasm you and SJ
invested in reviewing this work.
I will completely withdraw this design, step back, and
fundamentally rethink how to achieve this functionality
gracefully within proper subsystem boundaries.
Please give me some time to restructure this into a safe and
elegant design. In the meantime, I want to channel this energy
into doing something genuinely useful for you and the community.
I will take on the development of the patch pre-flight check bot
we discussed to automate the verification of CC lists based on
scripts/get_maintainer.pl. I hope this tool can help save your
valuable time and prevent other newcomers from repeating my
exact blunders.
Thank you again for your patience and for steering me back onto
the right path.
Best regards,
Wang Lian
> Thanks, Lorenzo
[-- Attachment #2: Type: text/html, Size: 44049 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
2026-07-03 1:10 ` wang lian
@ 2026-07-03 16:37 ` SJ Park
0 siblings, 0 replies; 11+ messages in thread
From: SJ Park @ 2026-07-03 16:37 UTC (permalink / raw)
To: wang lian
Cc: SJ Park, Lorenzo Stoakes, damon, linux-mm, daichaobing,
kunwu.chan, Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, linux-kernel
On Fri, 3 Jul 2026 09:10:51 +0800 wang lian <lianux.mm@gmail.com> wrote:
> Hi,Lorenzo
>
> > On Jul 2, 2026, at 18:23, Lorenzo Stoakes <ljs@kernel.org> wrote:
> >
> > +cc all those you missed.
> >
> > I really need to write a bot to do this, because I'm getting a little tired of
> > pointing this out :))
> >
> > On Thu, Jul 02, 2026 at 05:46:28PM +0800, Lian Wang wrote:
> >> include/linux/damon.h | 10 +++
> >> include/linux/khugepaged.h | 9 +++
> >> mm/damon/core.c | 2 +
> >> mm/damon/sysfs-schemes.c | 77 ++++++++++++++++++++++
> >> mm/damon/vaddr.c | 128 +++++++++++++++++++++++++++++++++++++
> >> mm/khugepaged.c | 46 +++++++++++++
> >> 6 files changed, 272 insertions(+)
> >
> > You are doing damon changes, and that belongs to SJ, sure.
> >
> > But you're also changing core THP code? Please ensure you cc- THP people because
> > without our approval this cannot be merged:
> >
> > $ scripts/get_maintainer.pl 20260702094633.75658-1-lianux.mm@gmail.com.mbx <mailto:20260702094633.75658-1-lianux.mm@gmail.com.mbx>
> > SJ Park <sj@kernel.org <mailto:sj@kernel.org>> (maintainer:DAMON)
> > Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > David Hildenbrand <david@kernel.org <mailto:david@kernel.org>> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > Lorenzo Stoakes <ljs@kernel.org <mailto:ljs@kernel.org>> (maintainer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > Zi Yan <ziy@nvidia.com <mailto:ziy@nvidia.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > Baolin Wang <baolin.wang@linux.alibaba.com <mailto:baolin.wang@linux.alibaba.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > "Liam R. Howlett" <liam@infradead.org <mailto:liam@infradead.org>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > Nico Pache <npache@redhat.com <mailto:npache@redhat.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > Ryan Roberts <ryan.roberts@arm.com <mailto:ryan.roberts@arm.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > Dev Jain <dev.jain@arm.com <mailto:dev.jain@arm.com>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > Barry Song <baohua@kernel.org <mailto:baohua@kernel.org>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > Lance Yang <lance.yang@linux.dev <mailto:lance.yang@linux.dev>> (reviewer:MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE))
> > damon@lists.linux.dev <mailto:damon@lists.linux.dev> (open list:DAMON)
> > linux-mm@kvack.org <mailto:linux-mm@kvack.org> (open list:DAMON)
> > linux-kernel@vger.kernel.org <mailto:linux-kernel@vger.kernel.org> (open list)
> >
> >>
> >> --
> >> 2.50.1 (Apple Git-155)
> >>
> >
>
> Thank you for pointing this out again, and I deeply appreciate all the tedious
> work you do as a maintainer to keep the mailing lists aligned.
>
> As a newcomer trying to transition into complex feature development, I must
> admit my initial design is far from perfect. I leveraged AI tools to assist
> with certain implementation paths, and although I extensively self-reviewed,
> tested, and reasoned through every single line of code, your and SJ's
> feedback clearly shows there are still major architectural flaws. Just like my
> very first kernel patch—which was strictly reviewed and guided by you and David H.
> —I know that your sharp feedback is exactly what will help me grow as a kernel engineer.
>
> I will completely step back and radically rethink the design of this series,
> especially regarding the cross-subsystem encapsulation and locking hazards.
No worries, take your time :)
>
> As for that bot you mentioned getting tired of running manually—let me write
> it for you. An imperfect newcomer who just made these exact mistakes knows
> precisely where the traps are. I can craft a pre-flight patch check script or
> a CI bot that automatically validates the `get_maintainer.pl` output against
> the patch file diffstat before any email is dispatched, throwing a loud warning
> if core MM files are touched but their respective maintainers are missing from
> the CC line.
>
> What do you think? I would love to build this tool to save your and other
> maintainers' valuable time in the future.
Sounds great! hkml already has all building blocks for doing the check, so I'm
planning to develop such checks in hkml, but only for manual runs. I ain't
develop and run a bot, since I don't have that much compute resource. Maybe
you could save your time by reusing the hkml feature on your bot.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
2026-07-02 16:52 ` [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
@ 2026-07-03 19:04 ` SJ Park
0 siblings, 0 replies; 11+ messages in thread
From: SJ Park @ 2026-07-03 19:04 UTC (permalink / raw)
To: SJ Park
Cc: Lorenzo Stoakes, Lian Wang, damon, linux-mm, daichaobing,
kunwu.chan, Andrew Morton, David Hildenbrand, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, linux-kernel
On Thu, 2 Jul 2026 09:52:34 -0700 SJ Park <sj@kernel.org> wrote:
> On Thu, 2 Jul 2026 11:23:55 +0100 Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> > +cc all those you missed.
>
> Thank you for doing this, Lorenzo.
>
> >
> > I really need to write a bot to do this, because I'm getting a little tired of
> > pointing this out :))
>
> Good idea. I will also consider implementing this kind of checks to to my lzy
> tool box [1] or hkml [2].
I implemented [1] the check logic on hkml. Many things to fix and improve in
future, but seems working at least for this series. You can use it like below:
$ hkml patch check --check_recipients only ./0002-mm-khugepaged-add-damon-collapse-folio-range-for-external-callers.patch
MISSING RECIPIENTS for [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers
- Andrew Morton <akpm@linux-foundation.org>
- David Hildenbrand <david@kernel.org>
- Lorenzo Stoakes <ljs@kernel.org>
- Zi Yan <ziy@nvidia.com>
- Baolin Wang <baolin.wang@linux.alibaba.com>
- "Liam R. Howlett" <liam@infradead.org>
- Nico Pache <npache@redhat.com>
- Ryan Roberts <ryan.roberts@arm.com>
- Dev Jain <dev.jain@arm.com>
- Barry Song <baohua@kernel.org>
- Lance Yang <lance.yang@linux.dev>
- Usama Arif <usama.arif@linux.dev>
- linux-kernel@vger.kernel.org
$ hkml patch check --check_recipients only ./0003-mm-damon-vaddr-implement-mthp-aware-damos-collapse-handler.patch
MISSING RECIPIENTS for [RESEND RFC PATCH v2 3/5] mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler
- Andrew Morton <akpm@linux-foundation.org>
- linux-kernel@vger.kernel.org
It is also integrated with the interactive mails list [2], so that you can run
it without manually downloading the patch files.
FWIW, 'hkml patch format' automatically adds recipients based on
get_maintainer.pl, so this check is unnecessary for hkml users. But this will
help me reviewing patches that not formatted with hkml or other similar
scripts.
hkml has a feature [3] to monitor new mails. Extending it to run this new
check and send mails for check failures could be one of the ways to implement
the bot. I'm not planning to do that by myself, though. I will run the check
only for patches that look interesting to me, for now.
[1] https://github.com/sjp38/hackermail/commit/9ca6fa1f71edc7a219edeb41d4c7f91
[2] https://github.com/sjp38/hackermail/blob/master/USAGE.md#interactive-viewer
[3] https://github.com/sjp38/hackermail/blob/master/USAGE.md#monitoring-mails
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-07-03 19:04 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260702094633.75658-1-lianux.mm@gmail.com>
2026-07-02 18:35 ` [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
2026-07-02 20:50 ` SJ Park
[not found] ` <20260702094633.75658-3-lianux.mm@gmail.com>
[not found] ` <akZDj6s8U5_Mnetv@lucifer>
2026-07-02 19:43 ` [RESEND RFC PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers SJ Park
2026-07-02 20:32 ` SJ Park
2026-07-03 2:06 ` wang lian
[not found] ` <akY6ZzPnwk-CToMp@lucifer>
2026-07-02 16:52 ` [RESEND RFC PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
2026-07-03 19:04 ` SJ Park
2026-07-03 1:10 ` wang lian
2026-07-03 16:37 ` SJ Park
2026-07-03 1:35 ` wang lian
[not found] <20260702095227.75866-1-lianux.mm@gmail.com>
2026-07-02 16:39 ` SJ Park
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox