Re: [PATCH 0/5] mm/khugepaged: add collapse hint machanism for khugepaged and use in mglru

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <ljs@kernel.org>
To: Luka Bai <lukafocus@icloud.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R. Howlett" <liam@infradead.org>,
	 Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	 Barry Song <baohua@kernel.org>,
	Lance Yang <lance.yang@linux.dev>,
	 Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Kairui Song <kasong@tencent.com>,
	 Qi Zheng <qi.zheng@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	 Rik van Riel <riel@surriel.com>, Harry Yoo <harry@kernel.org>,
	Jann Horn <jannh@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org, Luka Bai <lukabai@tencent.com>
Subject: Re: [PATCH 0/5] mm/khugepaged: add collapse hint machanism for khugepaged and use in mglru
Date: Tue, 9 Jun 2026 17:06:39 +0100	[thread overview]
Message-ID: <aigm-6ykF91FtZBx@lucifer> (raw)
In-Reply-To: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>

Hi Luka,

This should have been an RFC (equally, your other recent THP submission, [0]).

THP maintainer resource is highly constrained right now and we simply lack the
bandwidth for larger work at the moment.

In addition, I don't think we want to see any further major changes to THP
without without significant work first being done to rework and improve
the existing code base.

We have a lot of technical debt and adding more on top is building on sand,
really.

In general, we expect newcomers to the community to become familiar to the code
base through smaller changes (and also be useful to help with review) first, prior to
submitting larger changes.

So we currently don't really have the resource to review this series at the
moment, so I suggest you focus firstly on finding ways to refactor and improve
the khugepaged code.

Thanks, Lorenzo

[0]:https://lore.kernel.org/all/20260501-thp_cow-v1-0-005377483738@tencent.com/

On Sun, May 31, 2026 at 12:27:16PM +0800, Luka Bai wrote:
> Khugepaged is a background daemon for collapsing feasible pages together
> into a transparent hugepage in all sorts of orders up to PMD_ORDER. However,
> it doesn't have any preference in its collapsing and just iterate through
> all the qualified mm_struct, and scan their page tables from the beginning
> to the end. It is quite inefficient especially for large address spaces
> considering how slow the khugepaged can be, and may waste many hugepage
> resources collapsing memory areas that are seldomly accessed.
>
> We would like to give khugepaged some preference hints when we found
> certain areas are good condidates for collapsing. For example, if some memory
> areas are frequently accessed, then we know that it's valuable to merge
> them into a bigger folio since it will reduce many tlb misses.
>
> For example, MGLRU has walk_mm() and lru_gen_look_around() that are used to
> scan frequently accessed areas to save some works on rmap walking and
> generation elevation. By the same time, they are able to find those
> hot memory areas, it should be valuable to merge these areas into folios.
> MADV_COLLAPSE can be used, but that will cost too much time and will
> harm the performance of reclaimation and slow down the process that may
> enter the slow path of memory allocation. So the better choice shoule be to
> tell khugepaged to asynchronously do it.
>
> We add a khugepaged collapse hint framework in this patchset. The caller can
> call khugepaged_add_collapse_hint() to add hints for khugepaged to make it
> prioritize collapsing these specific address we found before doing Round-Robin
> scanning. Each mm_slot which belongs to a mm_struct in the previous
> mm_slots_hash is now a khugepaged_mm_slot, it comprises the old mm_slot
> struct and a number of NR_KHUGEPAGED_PRIORITY_LEVEL struct
> khugepaged_collapse_requests. The request struct for each mm_struct will
> be put in the global struct khugepaged_priority_queue with respect to its
> priority when __khugepaged_enter() is called on this mm (we give each mm request
> structs for hint dispersion and balancing across all the mm_structs that will
> be added in the future patches), and all the hints will be put in these request
> structs. Each hint will have the target address and the target vma struct. An
> example of the framework is like below:
>
> global collapse hints queues:
> prio 0 ------()----------------------------------()---------------
>             mm_slot0(process A)                mm_slot1(process B)
>                |                                               |
>            hint0---hint1---hint2---hint3       hint4---hint5---hint6
>
> prio 1 ------()----------------------------------()---------------
>             mm_slot0(process A)                mm_slot1(process B)
>                |                                               |
>              -------                                       hint7---hint8
>
> The khugepaged will try to scan queues from highest priority (which is prio 0 in
> the graph above) to the lowest priority (which is prio 1 in the graph), then go
> through the list, and check out all the struct khugepaged_mm_slot (which are the
> mm_slot0 and mm_slot1 in the graph above), so it will start from mm_slot0 in queue
> of priority 0. Then khugepaged will scan all the hints listed in the slot (hint0 ~
> hint3 in the above graph). After handling one hint (no mater success or fail on
> collapsing), the hint will be deleted. If one khugepaged_mm_slot doesn't have any
> hints in it, khugepaged will skip it and scan the next mm_slot in the same priority;
> if there is no hint in the queue of prio 0 anymore, khugepaged will scan the ones
> of prio 1; if there is no hint in any prio queues, it will fallback to do Round-Robin
> scanning like before.
>
> khugepaged_add_collapse_hint() is for adding hints, and it only gets called
> by walk_mm() and lru_gen_look_around() right now. In the future we may
> call it in more scenorios when we found hot memory areas. For example: in damon.
>
> We tested the performance by using valkey-server (based on redis) together with
> memtier_benchmark to simulate a gauss distribution on the get/set operations on
> a 160G, 64core x86 VM. The dataset is about 3G. After preloading db, the testing
> parameter was like below:
> memtier_benchmark -s 127.0.0.1 -p 6379 \
>       --ratio=1:1 \
>       --key-pattern=G:G \
>       --key-minimum=1 --key-maximum=3000000 \
>       --key-median=2000000 \
>       --key-stddev=150000 \
>       -d 1024 \
>       -t 1 -c 10 \
>       -n 2500000 \
>       --pipeline=32 \
>       --hide-histogram
>
> Since we wanted to see the influence of khugepaged collapse hints on the reduction of
> tlb misses, we made khugepaged do scanning every 1 second, and used the userspace
> interface to do walk_mm() for the cgroup which valkey-server was set into every 2 seconds.
> We made sure the server was all 4k pages before we run test, and only khugepaged could
> collapse them into large folios. We enable the anonymous THP of order 9, which is pmd
> size in most setup. We used perf stat to monitor the tlb misses statistics.
>
> After repeated tests, we could see dTLB-load-misses with a 13.50% reduction, and saw
> dTLB-store-misses with a 5% reduction compared to the setup without any collapse
> hint. The final throughput for the memtier_benchmark was about 2% to 5% improvement
> on average, which was not that obvious compared to the tlb miss reduction. We believed
> that was because there were too many factors to influence the final result of a random
> redis test, so the influence of tlb misses to the final throughput were compromised by
> other factors.
>
> Patch Details:
> ========
> * Patch 1 is to add the basic khugepaged hint framework like we introduced
>   above. Details can be seen in the commit itself and the comments in the
>   codes.
> * Patch 2 is to add a slab_cache for khugepaged_collapse_hint which can
>   improve the performance of allocating and freeing the hints.
> * Patch 3 is to add a deduplication machanism for the hints so that we will
>   not add a hint that points to a repeated address.
> * Patch 4 is to add the accounting for successful collapses initiated by
>   hint or non-hint.
> * Patch 5 is to add the collapse hint in lru_gen_look_around() and walk_mm()
>   of mglru.
>
> Thanks for reading. Comments and suggestions are very welcome!
>
> Signed-off-by: Luka Bai <lukabai@tencent.com>
> ---
> Luka Bai (5):
>       mm/khugepaged: add framework for khugepaged collapse hint
>       mm/khugepaged: use slab cache instead of normal kmalloc
>       mm/khugepaged: add deduplication when adding new collapse hint
>       mm/khugepaged: add accounting for successful hint or non-hint collapse
>       mm/khugepaged: add khugepaged collapse hint in mglru reference checking
>
>  include/linux/huge_mm.h    |   2 +
>  include/linux/khugepaged.h |  20 ++
>  include/linux/mmzone.h     |  17 +-
>  mm/huge_memory.c           |   4 +
>  mm/khugepaged.c            | 460 ++++++++++++++++++++++++++++++++++++++++++++-
>  mm/rmap.c                  |  27 ++-
>  mm/vmscan.c                |  33 +++-
>  7 files changed, 549 insertions(+), 14 deletions(-)
> ---
> base-commit: e1af79f3291a268adf4e149e1faba3052743e898
> change-id: 20260530-thp_collapse_hint-ec92bd943797
>
> Best regards,
> --
> Luka Bai <lukabai@tencent.com>
>

next prev parent reply	other threads:[~2026-06-09 16:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-31  4:27 [PATCH 0/5] mm/khugepaged: add collapse hint machanism for khugepaged and use in mglru Luka Bai
2026-05-31  4:27 ` [PATCH 1/5] mm/khugepaged: add framework for khugepaged collapse hint Luka Bai
2026-05-31  4:27 ` [PATCH 2/5] mm/khugepaged: use slab cache instead of normal kmalloc Luka Bai
2026-05-31  4:27 ` [PATCH 3/5] mm/khugepaged: add deduplication when adding new collapse hint Luka Bai
2026-05-31  4:27 ` [PATCH 4/5] mm/khugepaged: add accounting for successful hint or non-hint collapse Luka Bai
2026-05-31  4:27 ` [PATCH 5/5] mm/khugepaged: add khugepaged collapse hint in mglru reference checking Luka Bai
2026-06-09 10:17 ` [PATCH 0/5] mm/khugepaged: add collapse hint machanism for khugepaged and use in mglru Nico Pache
2026-06-09 14:44   ` Lorenzo Stoakes
2026-06-11  9:23     ` Karim Manaouil
2026-06-11 15:14       ` Zi Yan
2026-06-11 17:16         ` Matthew Wilcox
2026-06-12  1:01           ` Luka Bai
2026-06-11  3:07   ` Luka Bai
2026-06-09 16:06 ` Lorenzo Stoakes [this message]
2026-06-11  3:39   ` Luka Bai
  -- strict thread matches above, loose matches on Subject: below --
2026-05-31  4:23 Luka Bai
2026-05-31  4:40 ` Luka Bai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aigm-6ykF91FtZBx@lucifer \
    --to=ljs@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry@kernel.org \
    --cc=jannh@google.com \
    --cc=kasong@tencent.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lukabai@tencent.com \
    --cc=lukafocus@icloud.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=qi.zheng@linux.dev \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox