[PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
@ 2026-03-30 14:57 gutierrez.asier
  2026-03-30 23:43 ` (sashiko review) " SeongJae Park
  2026-03-31  1:31 ` SeongJae Park
  0 siblings, 2 replies; 9+ messages in thread
From: gutierrez.asier @ 2026-03-30 14:57 UTC (permalink / raw)
  To: gutierrez.asier, artem.kuzin, stepanov.anatoly, wangkefeng.wang,
	yanquanmin1, zuoze1, damon, sj, akpm, linux-mm, linux-kernel

From: Asier Gutierrez <gutierrez.asier@huawei-partners.com>

This patch set introces a new action:  DAMOS_COLLAPSE.

For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be
working, since it relies on hugepage_madvise to add a new slot. This
slot should be picked up by khugepaged and eventually collapse (or
not, if we are using DAMOS_NOHUGEPAGE) the pages. If THP is not
enabled, khugepaged will not be working, and therefore no collapse
will happen.

DAMOS_COLLAPSE eventually calls madvise_collapse, which will collapse
the address range synchronously.

This new action may be required to support autotuning with hugepage
as a goal[1].

[1]: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/

---------
Benchmarks:

Tests were performed in an ARM physical server with MariaDB 10.5 and 
sysbench. Read only benchmark was perform with uniform row hitting,
which means that all rows will be access with equal probability.

T n, D h: THP set to never, DAMON action set to hugepage
T m, D h: THP set to madvise, DAMON action set to hugepage
T n, D c: THP set to never, DAMON action set to collapse

Memory consumption. Lower is better.

+------------------+----------+----------+----------+
|                  | T n, D h | T m, D h | T n, D c |
+------------------+----------+----------+----------+
| Total memory use | 2.07     | 2.09     | 2.07     |
| Huge pages       | 0        | 1.3      | 1.25     |
+------------------+----------+----------+----------+

Performance in TPS (Transactions Per Second). Higher is better.

T n, D h: 18324.57
T n, D h 18452.69
T n, D c: 18432.17

Performance counter

I got the number of L1 D/I TLB accesses and the number a D/I TLB
accesses that triggered a page walk. I divided the second by the
first to get the percentage of page walkes per TLB access. The
lower the better.

+---------------+--------------+--------------+--------------+
|               | T n, D h     | T m, D h     | T n, D c     |
+---------------+--------------+--------------+--------------+
| L1 DTLB       | 127248242753 | 125431020479 | 125327001821 |
| L1 ITLB       | 80332558619  | 79346759071  | 79298139590  |
| DTLB walk     | 75011087     | 52800418     | 55895794     |
| ITLB walk     | 71577076     | 71505137     | 67262140     |
| DTLB % misses | 0.058948623  | 0.042095183  | 0.044599961  |
| ITLB % misses | 0.089100954  | 0.090117275  | 0.084821839  |
+---------------+--------------+--------------+--------------+

- We can see that DAMOS "hugepage" action works only when THP is set
  to madvise. "collapse" action works even when THP is set to never.
- Performance for "collapse" action is slightly lower than "hugepage"
  action and THP madvise.
- Memory consumption is slighly lower for "collapse" than "hugepage"
  with THP madvise. This is due to the khugepage collapses all VMAs,
  while "collapse" action only collapses the VMAs in the hot region.
- There is an improvement in THP utilization when collapse through
  "hugepage" or "collapse" actions are triggered.
- "collapse" action is performance synchronously, which means that
  page collapses happen earlier and more rapidly. This can be
  useful or not, depending on the scenario.

Collapse action just adds a new option to chose the correct system
balance.

Changes
---------
RFC v2 -> v1:
Fixed a missing comma in the selftest python stript
Added performance benchmarks

RFC v1 -> RFC v2:
Added benchmarks
Added damos_filter_type documentation for new action to fix kernel-doc

Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
---
 Documentation/mm/damon/design.rst      |  4 ++++
 include/linux/damon.h                  |  2 ++
 mm/damon/sysfs-schemes.c               |  4 ++++
 mm/damon/vaddr.c                       |  3 +++
 tools/testing/selftests/damon/sysfs.py | 11 ++++++-----
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index 838b14d22519..405142641e55 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -467,6 +467,10 @@ that supports each action are as below.
    Supported by ``vaddr`` and ``fvaddr`` operations set. When
    TRANSPARENT_HUGEPAGE is disabled, the application of the action will just
    fail.
+ - ``collapse``: Call ``madvise()`` for the region with ``MADV_COLLAPSE``.
+   Supported by ``vaddr`` and ``fvaddr`` operations set. When
+   TRANSPARENT_HUGEPAGE is disabled, the application of the action will just
+   fail.
  - ``lru_prio``: Prioritize the region on its LRU lists.
    Supported by ``paddr`` operations set.
  - ``lru_deprio``: Deprioritize the region on its LRU lists.
diff --git a/include/linux/damon.h b/include/linux/damon.h
index d9a3babbafc1..6941113968ec 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -121,6 +121,7 @@ struct damon_target {
  * @DAMOS_PAGEOUT:	Reclaim the region.
  * @DAMOS_HUGEPAGE:	Call ``madvise()`` for the region with MADV_HUGEPAGE.
  * @DAMOS_NOHUGEPAGE:	Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
+ * @DAMOS_COLLAPSE:	Call ``madvise()`` for the region with MADV_COLLAPSE.
  * @DAMOS_LRU_PRIO:	Prioritize the region on its LRU lists.
  * @DAMOS_LRU_DEPRIO:	Deprioritize the region on its LRU lists.
  * @DAMOS_MIGRATE_HOT:  Migrate the regions prioritizing warmer regions.
@@ -140,6 +141,7 @@ enum damos_action {
 	DAMOS_PAGEOUT,
 	DAMOS_HUGEPAGE,
 	DAMOS_NOHUGEPAGE,
+	DAMOS_COLLAPSE,
 	DAMOS_LRU_PRIO,
 	DAMOS_LRU_DEPRIO,
 	DAMOS_MIGRATE_HOT,
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 5186966dafb3..aa08a8f885fb 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -2041,6 +2041,10 @@ static struct damos_sysfs_action_name damos_sysfs_action_names[] = {
 		.action = DAMOS_NOHUGEPAGE,
 		.name = "nohugepage",
 	},
+	{
+		.action = DAMOS_COLLAPSE,
+		.name = "collapse",
+	},
 	{
 		.action = DAMOS_LRU_PRIO,
 		.name = "lru_prio",
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index b069dbc7e3d2..dd5f2d7027ac 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -903,6 +903,9 @@ static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
 	case DAMOS_NOHUGEPAGE:
 		madv_action = MADV_NOHUGEPAGE;
 		break;
+	case DAMOS_COLLAPSE:
+		madv_action = MADV_COLLAPSE;
+		break;
 	case DAMOS_MIGRATE_HOT:
 	case DAMOS_MIGRATE_COLD:
 		return damos_va_migrate(t, r, scheme, sz_filter_passed);
diff --git a/tools/testing/selftests/damon/sysfs.py b/tools/testing/selftests/damon/sysfs.py
index 3aa5c91548a5..72f53180c6a8 100755
--- a/tools/testing/selftests/damon/sysfs.py
+++ b/tools/testing/selftests/damon/sysfs.py
@@ -123,11 +123,12 @@ def assert_scheme_committed(scheme, dump):
             'pageout': 2,
             'hugepage': 3,
             'nohugeapge': 4,
-            'lru_prio': 5,
-            'lru_deprio': 6,
-            'migrate_hot': 7,
-            'migrate_cold': 8,
-            'stat': 9,
+            'collapse': 5,
+            'lru_prio': 6,
+            'lru_deprio': 7,
+            'migrate_hot': 8,
+            'migrate_cold': 9,
+            'stat': 10,
             }
     assert_true(dump['action'] == action_val[scheme.action], 'action', dump)
     assert_true(dump['apply_interval_us'] == scheme. apply_interval_us,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: (sashiko review) [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
  2026-03-30 14:57 [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action gutierrez.asier
@ 2026-03-30 23:43 ` SeongJae Park
  2026-03-31  0:01   ` SeongJae Park
  2026-03-31  1:31 ` SeongJae Park
  1 sibling, 1 reply; 9+ messages in thread
From: SeongJae Park @ 2026-03-30 23:43 UTC (permalink / raw)
  To: gutierrez.asier
  Cc: SeongJae Park, artem.kuzin, stepanov.anatoly, wangkefeng.wang,
	yanquanmin1, zuoze1, damon, akpm, linux-mm, linux-kernel

Forwarding Sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260330145758.2115502-1-gutierrez.asier@huawei-partners.com
# start of sashiko.dev inline review
commit cd78854e55ee90102333f0331c58417c16e8d798
Author: Asier Gutierrez <gutierrez.asier@huawei-partners.com>

mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action

This commit introduces the DAMOS_COLLAPSE action to DAMON. It allows DAMON
to synchronously collapse memory regions into huge pages by leveraging the
existing MADV_COLLAPSE mechanism.

Link: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/

[ ... ]

> diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
> index b069dbc7e3d2..dd5f2d7027ac 100644
> --- a/mm/damon/vaddr.c
> +++ b/mm/damon/vaddr.c
> @@ -903,6 +903,9 @@ static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
>  	case DAMOS_NOHUGEPAGE:
>  		madv_action = MADV_NOHUGEPAGE;
>  		break;
> +	case DAMOS_COLLAPSE:
> +		madv_action = MADV_COLLAPSE;

Could exposing MADV_COLLAPSE to automated continuous execution via DAMOS
trigger a pre-existing use-after-free in the core madvise logic?

In madvise_collapse(), the lock_dropped boolean tracks whether the lock is
currently dropped and tells madvise_walk_vmas() if the lock was ever dropped.

If collapse_single_pmd() drops the mmap lock, *lock_dropped is set to true.
On the next loop iteration, madvise_collapse() re-acquires the lock,
validates the VMA, and resets *lock_dropped = false to avoid double-locking.
If a subsequent iteration does not drop the lock, the function returns to
madvise_walk_vmas() with lock_dropped set to false.

Because madvise_walk_vmas() sees false, it assumes the lock was continuously
held and the original VMA pointer is still safe. However, another thread
could have freed the original VMA while the lock was temporarily dropped,
leading to a use-after-free when madvise_walk_vmas() dereferences
vma->vm_end.

While this flaw pre-dates this patch, does introducing DAMOS_COLLAPSE expose
it to continuous background execution by kdamond, making the race condition
much more likely to occur in practice?

> +		break;
>  	case DAMOS_MIGRATE_HOT:
>  	case DAMOS_MIGRATE_COLD:

# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260330145758.2115502-1-gutierrez.asier@huawei-partners.com
#
# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260330145758.2115502-1-gutierrez.asier@huawei-partners.com
#
# [1] https://github.com/sjp38/hackermail

Sent using hkml (https://github.com/sjp38/hackermail)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (sashiko review) [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
  2026-03-30 23:43 ` (sashiko review) " SeongJae Park
@ 2026-03-31  0:01   ` SeongJae Park
  0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2026-03-31  0:01 UTC (permalink / raw)
  To: SeongJae Park
  Cc: gutierrez.asier, artem.kuzin, stepanov.anatoly, wangkefeng.wang,
	yanquanmin1, zuoze1, damon, akpm, linux-mm, linux-kernel,
	Liam R . Howlett, Lorenzo Stoakes, David Hildenbrand,
	Vlastimil Babka, Jann Horn

+Cc madvise developers (maintainers and reviewers).

On Mon, 30 Mar 2026 16:43:37 -0700 SeongJae Park <sj@kernel.org> wrote:

> Forwarding Sashiko review for doing discussions via mails.
> 
> # review url: https://sashiko.dev/#/patchset/20260330145758.2115502-1-gutierrez.asier@huawei-partners.com
> # start of sashiko.dev inline review
> commit cd78854e55ee90102333f0331c58417c16e8d798
> Author: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
> 
> mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
> 
> This commit introduces the DAMOS_COLLAPSE action to DAMON. It allows DAMON
> to synchronously collapse memory regions into huge pages by leveraging the
> existing MADV_COLLAPSE mechanism.
> 
> Link: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/
> 
> [ ... ]
> 
> > diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
> > index b069dbc7e3d2..dd5f2d7027ac 100644
> > --- a/mm/damon/vaddr.c
> > +++ b/mm/damon/vaddr.c
> > @@ -903,6 +903,9 @@ static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
> >  	case DAMOS_NOHUGEPAGE:
> >  		madv_action = MADV_NOHUGEPAGE;
> >  		break;
> > +	case DAMOS_COLLAPSE:
> > +		madv_action = MADV_COLLAPSE;
> 
> Could exposing MADV_COLLAPSE to automated continuous execution via DAMOS
> trigger a pre-existing use-after-free in the core madvise logic?
> 
> In madvise_collapse(), the lock_dropped boolean tracks whether the lock is
> currently dropped and tells madvise_walk_vmas() if the lock was ever dropped.
> 
> If collapse_single_pmd() drops the mmap lock, *lock_dropped is set to true.
> On the next loop iteration, madvise_collapse() re-acquires the lock,
> validates the VMA, and resets *lock_dropped = false to avoid double-locking.
> If a subsequent iteration does not drop the lock, the function returns to
> madvise_walk_vmas() with lock_dropped set to false.
> 
> Because madvise_walk_vmas() sees false, it assumes the lock was continuously
> held and the original VMA pointer is still safe. However, another thread
> could have freed the original VMA while the lock was temporarily dropped,
> leading to a use-after-free when madvise_walk_vmas() dereferences
> vma->vm_end.
> 
> While this flaw pre-dates this patch,

It sounds like sashiko is saying the bug still exists and not yet fixed.
madivse() deveopers, could you please confirm if it is true?

> does introducing DAMOS_COLLAPSE expose
> it to continuous background execution by kdamond, making the race condition
> much more likely to occur in practice?

DAMOS_COLLAPSE simply calls do_madvise(), so I think it shouldn't be different
from user space programs calling madvise(MADV_COLLAPSE).  Hence I don't think
this patch is making something worse.

Nonetheless, (if my undertanding of sashiko's argument is not a hallucination
but a real), I agree this change could help the bug triggered more frequently
if DAMOS_COLLAPSE is adopted by more people faster than the bugfix is merged.
If that's the canse and it concerns madvise() developers, I think we can hold
this patch or adjust the scheudles to ensure this patch is merged only after
the MADV_COLLAPSE bug fix.  madvise() developers, please let us know if you
think such hold or schedule adjustment is needed.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
  2026-03-30 14:57 [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action gutierrez.asier
  2026-03-30 23:43 ` (sashiko review) " SeongJae Park
@ 2026-03-31  1:31 ` SeongJae Park
  2026-03-31 10:46   ` Stepanov Anatoly
  2026-03-31 15:15   ` Gutierrez Asier
  1 sibling, 2 replies; 9+ messages in thread
From: SeongJae Park @ 2026-03-31  1:31 UTC (permalink / raw)
  To: gutierrez.asier
  Cc: SeongJae Park, artem.kuzin, stepanov.anatoly, wangkefeng.wang,
	yanquanmin1, zuoze1, damon, akpm, linux-mm, linux-kernel

Hello Asier,

On Mon, 30 Mar 2026 14:57:58 +0000 <gutierrez.asier@huawei-partners.com> wrote:

> From: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
> 
> This patch set introces a new action:  DAMOS_COLLAPSE.
> 
> For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be
> working, since it relies on hugepage_madvise to add a new slot. This
> slot should be picked up by khugepaged and eventually collapse (or
> not, if we are using DAMOS_NOHUGEPAGE) the pages. If THP is not
> enabled, khugepaged will not be working, and therefore no collapse
> will happen.

I should raised this in a previous version, sorry.  But, that is only a half of
the picture.  That is, khugepaged is not the single THP allocator for
MADV_HUGEPAGE.  IIUC, MADV_HUGEPAGE-applied region also allocates huge pages in
page fault time.  According to the man page,

    The kernel will regularly scan the areas marked as huge page  candidates
    to replace  them with huge pages.  The kernel will also allocate huge pages
    directly when the region is naturally aligned to the huge page size (see
    posix_memalign(2)).

I think the description is better to be wordsmithed or clarified.  Maybe just
pointing the MADV_COLLAPSE intro commit (7d8faaf15545 ("mm/madvise: introduce
MADV_COLLAPSE sync hugepage collapse")) for the rationale could also be a good
approach, as the aimed goal of DAMOS_COLLAPSE is not different from
MADV_COLLAPSE.

> 
> DAMOS_COLLAPSE eventually calls madvise_collapse, which will collapse
> the address range synchronously.
> 
> This new action may be required to support autotuning with hugepage
> as a goal[1].
> 
> [1]: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/
> 
> ---------
> Benchmarks:

I recently heard some tools could think above line as the commentary
area [1] separation line.  Please use ==== like separator instead.  For
example,

    Benchmarks
    ==========

> 
> Tests were performed in an ARM physical server with MariaDB 10.5 and 
> sysbench. Read only benchmark was perform with uniform row hitting,
> which means that all rows will be access with equal probability.
> 
> T n, D h: THP set to never, DAMON action set to hugepage
> T m, D h: THP set to madvise, DAMON action set to hugepage
> T n, D c: THP set to never, DAMON action set to collapse
> 
> Memory consumption. Lower is better.
> 
> +------------------+----------+----------+----------+
> |                  | T n, D h | T m, D h | T n, D c |
> +------------------+----------+----------+----------+
> | Total memory use | 2.07     | 2.09     | 2.07     |
> | Huge pages       | 0        | 1.3      | 1.25     |
> +------------------+----------+----------+----------+
> 
> Performance in TPS (Transactions Per Second). Higher is better.
> 
> T n, D h: 18324.57
> T n, D h 18452.69

"T m, D h" ?

> T n, D c: 18432.17
> 
> Performance counter
> 
> I got the number of L1 D/I TLB accesses and the number a D/I TLB
> accesses that triggered a page walk. I divided the second by the
> first to get the percentage of page walkes per TLB access. The
> lower the better.
> 
> +---------------+--------------+--------------+--------------+
> |               | T n, D h     | T m, D h     | T n, D c     |
> +---------------+--------------+--------------+--------------+
> | L1 DTLB       | 127248242753 | 125431020479 | 125327001821 |
> | L1 ITLB       | 80332558619  | 79346759071  | 79298139590  |
> | DTLB walk     | 75011087     | 52800418     | 55895794     |
> | ITLB walk     | 71577076     | 71505137     | 67262140     |
> | DTLB % misses | 0.058948623  | 0.042095183  | 0.044599961  |
> | ITLB % misses | 0.089100954  | 0.090117275  | 0.084821839  |
> +---------------+--------------+--------------+--------------+
> 
> - We can see that DAMOS "hugepage" action works only when THP is set
>   to madvise. "collapse" action works even when THP is set to never.

Make sense.

> - Performance for "collapse" action is slightly lower than "hugepage"
>   action and THP madvise.

It would be good to add your theory about from where the difference comes.  I
suspect that's mainly because "hugepage" setup was allocating more THP?

> - Memory consumption is slighly lower for "collapse" than "hugepage"
>   with THP madvise. This is due to the khugepage collapses all VMAs,
>   while "collapse" action only collapses the VMAs in the hot region.

But you use thp=madvise, not thp=always?  So only hot regions, which
DAMOS_HUGEPAGE applied, could use THP.  It is same to DAMOS_COLLAPSE use case,
isn't it?

I'd rather suspect the natural-aligned region huge page allocation of
DAMOS_HUGEPAGE as a reason of this difference.  That is, DAMOS_HUGEPAGE applied
regions can allocate hugepages in the fault time, on multiple user threads.
Meanwhile, DAMOS_COLLAPSE should be executed by the single kdamond (if you
utilize only single kdamond).  This might resulted in DAMOS_HUGEPAGE allocating
more huge pages faster than DAMOS_COLLAPSE?

> - There is an improvement in THP utilization when collapse through
>   "hugepage" or "collapse" actions are triggered.

Could you clarify which data point is showing this?  Maybe "Huge pages" /
"Total memory use" ?  And why?  I again suspect the fault time huge pages
allocation.

> - "collapse" action is performance synchronously, which means that
>   page collapses happen earlier and more rapidly.

But these test results are not showing it clearly.  Rather, the results is
saying "hugepage" was able to make more huge pages than "collapse".  Still the
above sentence makes sense when we say about "collapsing" operations.  But,
this test is not showing it clearly.  I think we should make it clear the
limitation of this test.

>   This can be
>   useful or not, depending on the scenario.
> 
> Collapse action just adds a new option to chose the correct system
> balance.

That's a fair point.  I believe we also discussed pros and cons of
MADV_COLLAPSE, and concluded MADV_COLLAPSE is worthy to be added.  For
DAMOS_COLLAPSE, I don't think we have to do that again.

> 
> Changes
> ---------
> RFC v2 -> v1:
> Fixed a missing comma in the selftest python stript
> Added performance benchmarks
> 
> RFC v1 -> RFC v2:
> Added benchmarks
> Added damos_filter_type documentation for new action to fix kernel-doc

Please put changelog in the commentary area, and consider adding links to the
previous revisions [1].

> 
> Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
> ---

Code looks good to me.  Nonetheless I'd hope above commit message and benchmark
results analysis be more polished and/or clarified.

[1] https://docs.kernel.org/process/submitting-patches.html#commentary

Thanks,
SJ

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
  2026-03-31  1:31 ` SeongJae Park
@ 2026-03-31 10:46   ` Stepanov Anatoly
  2026-03-31 10:50     ` Stepanov Anatoly
  2026-03-31 15:15   ` Gutierrez Asier
  1 sibling, 1 reply; 9+ messages in thread
From: Stepanov Anatoly @ 2026-03-31 10:46 UTC (permalink / raw)
  To: SeongJae Park, gutierrez.asier
  Cc: artem.kuzin, wangkefeng.wang, yanquanmin1, zuoze1, damon, akpm,
	linux-mm, linux-kernel

On 3/31/2026 4:31 AM, SeongJae Park wrote:
> Hello Asier,
> 
> On Mon, 30 Mar 2026 14:57:58 +0000 <gutierrez.asier@huawei-partners.com> wrote:
> 
>> From: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
>>
>> This patch set introces a new action:  DAMOS_COLLAPSE.
>>
>> For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be
>> working, since it relies on hugepage_madvise to add a new slot. This
>> slot should be picked up by khugepaged and eventually collapse (or
>> not, if we are using DAMOS_NOHUGEPAGE) the pages. If THP is not
>> enabled, khugepaged will not be working, and therefore no collapse
>> will happen.
> 
> I should raised this in a previous version, sorry.  But, that is only a half of
> the picture.  That is, khugepaged is not the single THP allocator for
> MADV_HUGEPAGE.  IIUC, MADV_HUGEPAGE-applied region also allocates huge pages in
> page fault time.  According to the man page,
> 
>     The kernel will regularly scan the areas marked as huge page  candidates
>     to replace  them with huge pages.  The kernel will also allocate huge pages
>     directly when the region is naturally aligned to the huge page size (see
>     posix_memalign(2)).
> 
I think key difference between DAMOS_HUGEPAGE and DAMOS_COLLAPSE is the granularity.

In DAMOS_HUGEPAGE case, the granularity is always VMA, even if the hot region is narrow.
It's true for both page-fault based collapse and khugepaged collapse.

With DAMOS_COLLAPSE we can cover cases, when there's large VMA, for example,
which contains some hot VA region inside, so we can collapse just that region, not the whole VMA.
 

> I think the description is better to be wordsmithed or clarified.  Maybe just
> pointing the MADV_COLLAPSE intro commit (7d8faaf15545 ("mm/madvise: introduce
> MADV_COLLAPSE sync hugepage collapse")) for the rationale could also be a good
> approach, as the aimed goal of DAMOS_COLLAPSE is not different from
> MADV_COLLAPSE.
> 
>>
>> DAMOS_COLLAPSE eventually calls madvise_collapse, which will collapse
>> the address range synchronously.
>>
>> This new action may be required to support autotuning with hugepage
>> as a goal[1].
>>
>> [1]: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/
>>
>> ---------
>> Benchmarks:
> 
> I recently heard some tools could think above line as the commentary
> area [1] separation line.  Please use ==== like separator instead.  For
> example,
> 
>     Benchmarks
>     ==========
> 
>>
>> Tests were performed in an ARM physical server with MariaDB 10.5 and 
>> sysbench. Read only benchmark was perform with uniform row hitting,
>> which means that all rows will be access with equal probability.
>>
>> T n, D h: THP set to never, DAMON action set to hugepage
>> T m, D h: THP set to madvise, DAMON action set to hugepage
>> T n, D c: THP set to never, DAMON action set to collapse
>>
>> Memory consumption. Lower is better.
>>
>> +------------------+----------+----------+----------+
>> |                  | T n, D h | T m, D h | T n, D c |
>> +------------------+----------+----------+----------+
>> | Total memory use | 2.07     | 2.09     | 2.07     |
>> | Huge pages       | 0        | 1.3      | 1.25     |
>> +------------------+----------+----------+----------+
>>
>> Performance in TPS (Transactions Per Second). Higher is better.
>>
>> T n, D h: 18324.57
>> T n, D h 18452.69
> 
> "T m, D h" ?
> 
>> T n, D c: 18432.17
>>
>> Performance counter
>>
>> I got the number of L1 D/I TLB accesses and the number a D/I TLB
>> accesses that triggered a page walk. I divided the second by the
>> first to get the percentage of page walkes per TLB access. The
>> lower the better.
>>
>> +---------------+--------------+--------------+--------------+
>> |               | T n, D h     | T m, D h     | T n, D c     |
>> +---------------+--------------+--------------+--------------+
>> | L1 DTLB       | 127248242753 | 125431020479 | 125327001821 |
>> | L1 ITLB       | 80332558619  | 79346759071  | 79298139590  |
>> | DTLB walk     | 75011087     | 52800418     | 55895794     |
>> | ITLB walk     | 71577076     | 71505137     | 67262140     |
>> | DTLB % misses | 0.058948623  | 0.042095183  | 0.044599961  |
>> | ITLB % misses | 0.089100954  | 0.090117275  | 0.084821839  |
>> +---------------+--------------+--------------+--------------+
>>
>> - We can see that DAMOS "hugepage" action works only when THP is set
>>   to madvise. "collapse" action works even when THP is set to never.
> 
> Make sense.
> 
>> - Performance for "collapse" action is slightly lower than "hugepage"
>>   action and THP madvise.
> 
> It would be good to add your theory about from where the difference comes.  I
> suspect that's mainly because "hugepage" setup was allocating more THP?
> 
>> - Memory consumption is slighly lower for "collapse" than "hugepage"
>>   with THP madvise. This is due to the khugepage collapses all VMAs,
>>   while "collapse" action only collapses the VMAs in the hot region.
> 
> But you use thp=madvise, not thp=always?  So only hot regions, which
> DAMOS_HUGEPAGE applied, could use THP.  It is same to DAMOS_COLLAPSE use case,
> isn't it?
> 
> I'd rather suspect the natural-aligned region huge page allocation of
> DAMOS_HUGEPAGE as a reason of this difference.  That is, DAMOS_HUGEPAGE applied
> regions can allocate hugepages in the fault time, on multiple user threads.
> Meanwhile, DAMOS_COLLAPSE should be executed by the single kdamond (if you
> utilize only single kdamond).  This might resulted in DAMOS_HUGEPAGE allocating
> more huge pages faster than DAMOS_COLLAPSE?
> 
>> - There is an improvement in THP utilization when collapse through
>>   "hugepage" or "collapse" actions are triggered.
> 
> Could you clarify which data point is showing this?  Maybe "Huge pages" /
> "Total memory use" ?  And why?  I again suspect the fault time huge pages
> allocation.
> 
>> - "collapse" action is performance synchronously, which means that
>>   page collapses happen earlier and more rapidly.
> 
> But these test results are not showing it clearly.  Rather, the results is
> saying "hugepage" was able to make more huge pages than "collapse".  Still the
> above sentence makes sense when we say about "collapsing" operations.  But,
> this test is not showing it clearly.  I think we should make it clear the
> limitation of this test.
> 
>>   This can be
>>   useful or not, depending on the scenario.
>>
>> Collapse action just adds a new option to chose the correct system
>> balance.
> 
> That's a fair point.  I believe we also discussed pros and cons of
> MADV_COLLAPSE, and concluded MADV_COLLAPSE is worthy to be added.  For
> DAMOS_COLLAPSE, I don't think we have to do that again.
> 
>>
>> Changes
>> ---------
>> RFC v2 -> v1:
>> Fixed a missing comma in the selftest python stript
>> Added performance benchmarks
>>
>> RFC v1 -> RFC v2:
>> Added benchmarks
>> Added damos_filter_type documentation for new action to fix kernel-doc
> 
> Please put changelog in the commentary area, and consider adding links to the
> previous revisions [1].
> 
>>
>> Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
>> ---
> 
> Code looks good to me.  Nonetheless I'd hope above commit message and benchmark
> results analysis be more polished and/or clarified.
> 
> [1] https://docs.kernel.org/process/submitting-patches.html#commentary
> 
> 
> Thanks,
> SJ


-- 
Anatoly Stepanov, Huawei


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
  2026-03-31 10:46   ` Stepanov Anatoly
@ 2026-03-31 10:50     ` Stepanov Anatoly
  2026-04-01  0:54       ` SeongJae Park
  0 siblings, 1 reply; 9+ messages in thread
From: Stepanov Anatoly @ 2026-03-31 10:50 UTC (permalink / raw)
  To: SeongJae Park, gutierrez.asier
  Cc: artem.kuzin, wangkefeng.wang, yanquanmin1, zuoze1, damon, akpm,
	linux-mm, linux-kernel

On 3/31/2026 1:46 PM, Stepanov Anatoly wrote:
> On 3/31/2026 4:31 AM, SeongJae Park wrote:
>> Hello Asier,
>>
>> On Mon, 30 Mar 2026 14:57:58 +0000 <gutierrez.asier@huawei-partners.com> wrote:
>>
>>> From: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
>>>
>>> This patch set introces a new action:  DAMOS_COLLAPSE.
>>>
>>> For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be
>>> working, since it relies on hugepage_madvise to add a new slot. This
>>> slot should be picked up by khugepaged and eventually collapse (or
>>> not, if we are using DAMOS_NOHUGEPAGE) the pages. If THP is not
>>> enabled, khugepaged will not be working, and therefore no collapse
>>> will happen.
>>
>> I should raised this in a previous version, sorry.  But, that is only a half of
>> the picture.  That is, khugepaged is not the single THP allocator for
>> MADV_HUGEPAGE.  IIUC, MADV_HUGEPAGE-applied region also allocates huge pages in
>> page fault time.  According to the man page,
>>
>>     The kernel will regularly scan the areas marked as huge page  candidates
>>     to replace  them with huge pages.  The kernel will also allocate huge pages
>>     directly when the region is naturally aligned to the huge page size (see
>>     posix_memalign(2)).
>>
> I think key difference between DAMOS_HUGEPAGE and DAMOS_COLLAPSE is the granularity.
> 
> In DAMOS_HUGEPAGE case, the granularity is always VMA, even if the hot region is narrow.
> It's true for both page-fault based collapse and khugepaged collapse.
*page-fault THP allocation, not collapse of course.

> 
> With DAMOS_COLLAPSE we can cover cases, when there's large VMA, for example,
> which contains some hot VA region inside, so we can collapse just that region, not the whole VMA.
>  
> 
>> I think the description is better to be wordsmithed or clarified.  Maybe just
>> pointing the MADV_COLLAPSE intro commit (7d8faaf15545 ("mm/madvise: introduce
>> MADV_COLLAPSE sync hugepage collapse")) for the rationale could also be a good
>> approach, as the aimed goal of DAMOS_COLLAPSE is not different from
>> MADV_COLLAPSE.
>>
>>>
>>> DAMOS_COLLAPSE eventually calls madvise_collapse, which will collapse
>>> the address range synchronously.
>>>
>>> This new action may be required to support autotuning with hugepage
>>> as a goal[1].
>>>
>>> [1]: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/
>>>
>>> ---------
>>> Benchmarks:
>>
>> I recently heard some tools could think above line as the commentary
>> area [1] separation line.  Please use ==== like separator instead.  For
>> example,
>>
>>     Benchmarks
>>     ==========
>>
>>>
>>> Tests were performed in an ARM physical server with MariaDB 10.5 and 
>>> sysbench. Read only benchmark was perform with uniform row hitting,
>>> which means that all rows will be access with equal probability.
>>>
>>> T n, D h: THP set to never, DAMON action set to hugepage
>>> T m, D h: THP set to madvise, DAMON action set to hugepage
>>> T n, D c: THP set to never, DAMON action set to collapse
>>>
>>> Memory consumption. Lower is better.
>>>
>>> +------------------+----------+----------+----------+
>>> |                  | T n, D h | T m, D h | T n, D c |
>>> +------------------+----------+----------+----------+
>>> | Total memory use | 2.07     | 2.09     | 2.07     |
>>> | Huge pages       | 0        | 1.3      | 1.25     |
>>> +------------------+----------+----------+----------+
>>>
>>> Performance in TPS (Transactions Per Second). Higher is better.
>>>
>>> T n, D h: 18324.57
>>> T n, D h 18452.69
>>
>> "T m, D h" ?
>>
>>> T n, D c: 18432.17
>>>
>>> Performance counter
>>>
>>> I got the number of L1 D/I TLB accesses and the number a D/I TLB
>>> accesses that triggered a page walk. I divided the second by the
>>> first to get the percentage of page walkes per TLB access. The
>>> lower the better.
>>>
>>> +---------------+--------------+--------------+--------------+
>>> |               | T n, D h     | T m, D h     | T n, D c     |
>>> +---------------+--------------+--------------+--------------+
>>> | L1 DTLB       | 127248242753 | 125431020479 | 125327001821 |
>>> | L1 ITLB       | 80332558619  | 79346759071  | 79298139590  |
>>> | DTLB walk     | 75011087     | 52800418     | 55895794     |
>>> | ITLB walk     | 71577076     | 71505137     | 67262140     |
>>> | DTLB % misses | 0.058948623  | 0.042095183  | 0.044599961  |
>>> | ITLB % misses | 0.089100954  | 0.090117275  | 0.084821839  |
>>> +---------------+--------------+--------------+--------------+
>>>
>>> - We can see that DAMOS "hugepage" action works only when THP is set
>>>   to madvise. "collapse" action works even when THP is set to never.
>>
>> Make sense.
>>
>>> - Performance for "collapse" action is slightly lower than "hugepage"
>>>   action and THP madvise.
>>
>> It would be good to add your theory about from where the difference comes.  I
>> suspect that's mainly because "hugepage" setup was allocating more THP?
>>
>>> - Memory consumption is slighly lower for "collapse" than "hugepage"
>>>   with THP madvise. This is due to the khugepage collapses all VMAs,
>>>   while "collapse" action only collapses the VMAs in the hot region.
>>
>> But you use thp=madvise, not thp=always?  So only hot regions, which
>> DAMOS_HUGEPAGE applied, could use THP.  It is same to DAMOS_COLLAPSE use case,
>> isn't it?
>>
>> I'd rather suspect the natural-aligned region huge page allocation of
>> DAMOS_HUGEPAGE as a reason of this difference.  That is, DAMOS_HUGEPAGE applied
>> regions can allocate hugepages in the fault time, on multiple user threads.
>> Meanwhile, DAMOS_COLLAPSE should be executed by the single kdamond (if you
>> utilize only single kdamond).  This might resulted in DAMOS_HUGEPAGE allocating
>> more huge pages faster than DAMOS_COLLAPSE?
>>
>>> - There is an improvement in THP utilization when collapse through
>>>   "hugepage" or "collapse" actions are triggered.
>>
>> Could you clarify which data point is showing this?  Maybe "Huge pages" /
>> "Total memory use" ?  And why?  I again suspect the fault time huge pages
>> allocation.
>>
>>> - "collapse" action is performance synchronously, which means that
>>>   page collapses happen earlier and more rapidly.
>>
>> But these test results are not showing it clearly.  Rather, the results is
>> saying "hugepage" was able to make more huge pages than "collapse".  Still the
>> above sentence makes sense when we say about "collapsing" operations.  But,
>> this test is not showing it clearly.  I think we should make it clear the
>> limitation of this test.
>>
>>>   This can be
>>>   useful or not, depending on the scenario.
>>>
>>> Collapse action just adds a new option to chose the correct system
>>> balance.
>>
>> That's a fair point.  I believe we also discussed pros and cons of
>> MADV_COLLAPSE, and concluded MADV_COLLAPSE is worthy to be added.  For
>> DAMOS_COLLAPSE, I don't think we have to do that again.
>>
>>>
>>> Changes
>>> ---------
>>> RFC v2 -> v1:
>>> Fixed a missing comma in the selftest python stript
>>> Added performance benchmarks
>>>
>>> RFC v1 -> RFC v2:
>>> Added benchmarks
>>> Added damos_filter_type documentation for new action to fix kernel-doc
>>
>> Please put changelog in the commentary area, and consider adding links to the
>> previous revisions [1].
>>
>>>
>>> Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
>>> ---
>>
>> Code looks good to me.  Nonetheless I'd hope above commit message and benchmark
>> results analysis be more polished and/or clarified.
>>
>> [1] https://docs.kernel.org/process/submitting-patches.html#commentary
>>
>>
>> Thanks,
>> SJ
> 
> 


-- 
Anatoly Stepanov, Huawei


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
  2026-03-31 10:50     ` Stepanov Anatoly
@ 2026-04-01  0:54       ` SeongJae Park
  0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2026-04-01  0:54 UTC (permalink / raw)
  To: Stepanov Anatoly
  Cc: SeongJae Park, gutierrez.asier, artem.kuzin, wangkefeng.wang,
	yanquanmin1, zuoze1, damon, akpm, linux-mm, linux-kernel

On Tue, 31 Mar 2026 13:50:21 +0300 Stepanov Anatoly <stepanov.anatoly@huawei.com> wrote:

> On 3/31/2026 1:46 PM, Stepanov Anatoly wrote:
> > On 3/31/2026 4:31 AM, SeongJae Park wrote:
> >> Hello Asier,
> >>
> >> On Mon, 30 Mar 2026 14:57:58 +0000 <gutierrez.asier@huawei-partners.com> wrote:
> >>
> >>> From: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
> >>>
> >>> This patch set introces a new action:  DAMOS_COLLAPSE.
> >>>
> >>> For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be
> >>> working, since it relies on hugepage_madvise to add a new slot. This
> >>> slot should be picked up by khugepaged and eventually collapse (or
> >>> not, if we are using DAMOS_NOHUGEPAGE) the pages. If THP is not
> >>> enabled, khugepaged will not be working, and therefore no collapse
> >>> will happen.
> >>
> >> I should raised this in a previous version, sorry.  But, that is only a half of
> >> the picture.  That is, khugepaged is not the single THP allocator for
> >> MADV_HUGEPAGE.  IIUC, MADV_HUGEPAGE-applied region also allocates huge pages in
> >> page fault time.  According to the man page,
> >>
> >>     The kernel will regularly scan the areas marked as huge page  candidates
> >>     to replace  them with huge pages.  The kernel will also allocate huge pages
> >>     directly when the region is naturally aligned to the huge page size (see
> >>     posix_memalign(2)).
> >>
> > I think key difference between DAMOS_HUGEPAGE and DAMOS_COLLAPSE is the granularity.
> > 
> > In DAMOS_HUGEPAGE case, the granularity is always VMA, even if the hot region is narrow.
> > It's true for both page-fault based collapse and khugepaged collapse.
> *page-fault THP allocation, not collapse of course.

Good point.  I think this difference can also help answering why DAMOS_COLLAPSE
was making less huge pages in the test.

> 
> > 
> > With DAMOS_COLLAPSE we can cover cases, when there's large VMA, for example,
> > which contains some hot VA region inside, so we can collapse just that region, not the whole VMA.

This also makes sense to me.  Also I think this aligns to what we discussed for
MADV_COLLAPSE intro.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
  2026-03-31  1:31 ` SeongJae Park
  2026-03-31 10:46   ` Stepanov Anatoly
@ 2026-03-31 15:15   ` Gutierrez Asier
  2026-04-01  0:59     ` SeongJae Park
  1 sibling, 1 reply; 9+ messages in thread
From: Gutierrez Asier @ 2026-03-31 15:15 UTC (permalink / raw)
  To: SeongJae Park
  Cc: artem.kuzin, stepanov.anatoly, wangkefeng.wang, yanquanmin1,
	zuoze1, damon, akpm, linux-mm, linux-kernel

Hi SJ,

On 3/31/2026 4:31 AM, SeongJae Park wrote:
> Hello Asier,
> 
> On Mon, 30 Mar 2026 14:57:58 +0000 <gutierrez.asier@huawei-partners.com> wrote:
> 
>> From: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
>>
>> This patch set introces a new action:  DAMOS_COLLAPSE.
>>
>> For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be
>> working, since it relies on hugepage_madvise to add a new slot. This
>> slot should be picked up by khugepaged and eventually collapse (or
>> not, if we are using DAMOS_NOHUGEPAGE) the pages. If THP is not
>> enabled, khugepaged will not be working, and therefore no collapse
>> will happen.
> 
> I should raised this in a previous version, sorry.  But, that is only a half of
> the picture.  That is, khugepaged is not the single THP allocator for
> MADV_HUGEPAGE.  IIUC, MADV_HUGEPAGE-applied region also allocates huge pages in
> page fault time.  According to the man page,
> 
>     The kernel will regularly scan the areas marked as huge page  candidates
>     to replace  them with huge pages.  The kernel will also allocate huge pages
>     directly when the region is naturally aligned to the huge page size (see
>     posix_memalign(2)).
> 
> I think the description is better to be wordsmithed or clarified.  Maybe just
> pointing the MADV_COLLAPSE intro commit (7d8faaf15545 ("mm/madvise: introduce
> MADV_COLLAPSE sync hugepage collapse")) for the rationale could also be a good
> approach, as the aimed goal of DAMOS_COLLAPSE is not different from
> MADV_COLLAPSE.
> 
>>
>> DAMOS_COLLAPSE eventually calls madvise_collapse, which will collapse
>> the address range synchronously.
>>
>> This new action may be required to support autotuning with hugepage
>> as a goal[1].
>>
>> [1]: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/
>>
>> ---------
>> Benchmarks:
> 
> I recently heard some tools could think above line as the commentary
> area [1] separation line.  Please use ==== like separator instead.  For
> example,
I will fix it for the next version.
> 
>     Benchmarks
>     ==========
> 
>>
>> Tests were performed in an ARM physical server with MariaDB 10.5 and 
>> sysbench. Read only benchmark was perform with uniform row hitting,
>> which means that all rows will be access with equal probability.
>>
>> T n, D h: THP set to never, DAMON action set to hugepage
>> T m, D h: THP set to madvise, DAMON action set to hugepage
>> T n, D c: THP set to never, DAMON action set to collapse
>>
>> Memory consumption. Lower is better.
>>
>> +------------------+----------+----------+----------+
>> |                  | T n, D h | T m, D h | T n, D c |
>> +------------------+----------+----------+----------+
>> | Total memory use | 2.07     | 2.09     | 2.07     |
>> | Huge pages       | 0        | 1.3      | 1.25     |
>> +------------------+----------+----------+----------+
>>
>> Performance in TPS (Transactions Per Second). Higher is better.
>>
>> T n, D h: 18324.57
>> T n, D h 18452.69
> 
> "T m, D h" ?
Right, my bad. I will fix it.
> 
>> T n, D c: 18432.17
>>
>> Performance counter
>>
>> I got the number of L1 D/I TLB accesses and the number a D/I TLB
>> accesses that triggered a page walk. I divided the second by the
>> first to get the percentage of page walkes per TLB access. The
>> lower the better.
>>
>> +---------------+--------------+--------------+--------------+
>> |               | T n, D h     | T m, D h     | T n, D c     |
>> +---------------+--------------+--------------+--------------+
>> | L1 DTLB       | 127248242753 | 125431020479 | 125327001821 |
>> | L1 ITLB       | 80332558619  | 79346759071  | 79298139590  |
>> | DTLB walk     | 75011087     | 52800418     | 55895794     |
>> | ITLB walk     | 71577076     | 71505137     | 67262140     |
>> | DTLB % misses | 0.058948623  | 0.042095183  | 0.044599961  |
>> | ITLB % misses | 0.089100954  | 0.090117275  | 0.084821839  |
>> +---------------+--------------+--------------+--------------+
>>
>> - We can see that DAMOS "hugepage" action works only when THP is set
>>   to madvise. "collapse" action works even when THP is set to never.
> 
> Make sense.
> 
>> - Performance for "collapse" action is slightly lower than "hugepage"
>>   action and THP madvise.
> 
> It would be good to add your theory about from where the difference comes.  I
> suspect that's mainly because "hugepage" setup was allocating more THP?
Correct. I will add a better description of the behaviour.
>> - Memory consumption is slighly lower for "collapse" than "hugepage"
>>   with THP madvise. This is due to the khugepage collapses all VMAs,
>>   while "collapse" action only collapses the VMAs in the hot region.
> 
> But you use thp=madvise, not thp=always?  So only hot regions, which
> DAMOS_HUGEPAGE applied, could use THP.  It is same to DAMOS_COLLAPSE use case,
> isn't it?
> I'd rather suspect the natural-aligned region huge page allocation of
> DAMOS_HUGEPAGE as a reason of this difference.  That is, DAMOS_HUGEPAGE applied
> regions can allocate hugepages in the fault time, on multiple user threads.
> Meanwhile, DAMOS_COLLAPSE should be executed by the single kdamond (if you
> utilize only single kdamond).  This might resulted in DAMOS_HUGEPAGE allocating
> more huge pages faster than DAMOS_COLLAPSE?
This well could be the case. The database used 32 threads, so we may have
simultaneously faults in all those threads, hence the behavior.
> 
>> - There is an improvement in THP utilization when collapse through
>>   "hugepage" or "collapse" actions are triggered.
> 
> Could you clarify which data point is showing this?  Maybe "Huge pages" /
> "Total memory use" ?  And why?  I again suspect the fault time huge pages
> allocation.
Looking at the performance counters and the percentage of TLB accesses
that triggered a page walk. I will clarify this point in the next version.
> 
>> - "collapse" action is performance synchronously, which means that
>>   page collapses happen earlier and more rapidly.
> 
> But these test results are not showing it clearly.  Rather, the results is
> saying "hugepage" was able to make more huge pages than "collapse".  Still the
> above sentence makes sense when we say about "collapsing" operations.  But,
> this test is not showing it clearly.  I think we should make it clear the
> limitation of this test.
I will add another table clarifying this. My point was that the allocation
in my tests happened earlier and faster using "collapse" than "hugepage".
>>   This can be
>>   useful or not, depending on the scenario.
>>
>> Collapse action just adds a new option to chose the correct system
>> balance.
> 
> That's a fair point.  I believe we also discussed pros and cons of
> MADV_COLLAPSE, and concluded MADV_COLLAPSE is worthy to be added.  For
> DAMOS_COLLAPSE, I don't think we have to do that again.
> 
>>
>> Changes
>> ---------
>> RFC v2 -> v1:
>> Fixed a missing comma in the selftest python stript
>> Added performance benchmarks
>>
>> RFC v1 -> RFC v2:
>> Added benchmarks
>> Added damos_filter_type documentation for new action to fix kernel-doc
> 
> Please put changelog in the commentary area, and consider adding links to the
> previous revisions [1].
Ack
> 
>>
>> Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
>> ---
> 
> Code looks good to me.  Nonetheless I'd hope above commit message and benchmark
> results analysis be more polished and/or clarified.
> 
> [1] https://docs.kernel.org/process/submitting-patches.html#commentary
> 
> 
> Thanks,
> SJ

Thanks for the review.

I will run some more tests today. The test try to hit every row in the database,
which may not be realistic. Usually only some parts of the table are really hot.
I will change the test to get something closer to a normal distribution on the
table hit.

-- 
Asier Gutierrez
Huawei



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
  2026-03-31 15:15   ` Gutierrez Asier
@ 2026-04-01  0:59     ` SeongJae Park
  0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2026-04-01  0:59 UTC (permalink / raw)
  To: Gutierrez Asier
  Cc: SeongJae Park, artem.kuzin, stepanov.anatoly, wangkefeng.wang,
	yanquanmin1, zuoze1, damon, akpm, linux-mm, linux-kernel

On Tue, 31 Mar 2026 18:15:50 +0300 Gutierrez Asier <gutierrez.asier@huawei-partners.com> wrote:

> Hi SJ,
> 
> On 3/31/2026 4:31 AM, SeongJae Park wrote:
[...]
> Thanks for the review.

Thank you for generously accepting my suggestions and answering questions, too.

> 
> I will run some more tests today. The test try to hit every row in the database,
> which may not be realistic. Usually only some parts of the table are really hot.
> I will change the test to get something closer to a normal distribution on the
> table hit.

Sounds good.  I wouldn't strongly request more detailed test results for this
patch, though.  As the idea is solid, just the simple test showing the expected
behavior is already good for me.  I'm rather bit concerned if you will go down
to a rabbit hole while doing more tests.

Nonetheless, if you want to make the commit message more complete, I have no
reason to stop you, either :)


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-01  0:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 14:57 [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action gutierrez.asier
2026-03-30 23:43 ` (sashiko review) " SeongJae Park
2026-03-31  0:01   ` SeongJae Park
2026-03-31  1:31 ` SeongJae Park
2026-03-31 10:46   ` Stepanov Anatoly
2026-03-31 10:50     ` Stepanov Anatoly
2026-04-01  0:54       ` SeongJae Park
2026-03-31 15:15   ` Gutierrez Asier
2026-04-01  0:59     ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox