Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] Introducte Reserved THP
@ 2026-06-27  7:21 Qi Zheng
  2026-06-27  7:21 ` [RFC PATCH 1/8] mm: page_alloc: add reserved THP pageblock type Qi Zheng
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Qi Zheng @ 2026-06-27  7:21 UTC (permalink / raw)
  To: akpm, david, ljs, ziy, baolin.wang, liam, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, muchun.song, osalvador, chrisl,
	kasong, shikemeng, nphamcs, baoquan.he, youngjun.park, peterx,
	usama.arif, willy, vbabka, surenb, mhocko, jackmanb, hannes
  Cc: linux-mm, linux-kernel, Qi Zheng

From: Qi Zheng <zhengqi.arch@bytedance.com>

Hi all,

This RFC patchset introduces a new feature called "Reserved THP", and I'd like
to open up a discussion on how to use this as a stepping stone toward unifying
HugeTLB and THP (Transparent Huge Page).

1. Background
=============

Currently, two huge page solutions co-exist in the kernel:

1. HugeTLB: Supports reservation, guaranteeing successful allocation within the
            reserved pool. However, it does not support features like swap. And
            it is a relatively independent subsystem.
2. THP: Does not support reservation and may fail to allocate and fallback to
        small pages when system memory is fragmented, but it is more tightly
        integrated with mm core and supports features like swap.

Both have their pros and cons. However, in one of our internal scenarios, it
seems we need to combine the features of both to meet the requirements.

In our internal scenario, a user process needs to reserve double the amount
of Hugetlb memory due to hot-upgrade requirements. For example, if the
process needs 16GB of Hugetlb, an additional 16GB is required during the
hot-upgrade to satisfy memory allocations. After the upgrade, the old
process exits and releases the 16GB of HugeTLB. Therefore, in most cases,
the extra 16GB of HugeTLB is wasted.

A straightforward idea is to use the Hugetlb CMA feature, reserving a total
of 32GB of hugetlb_cma. During normal operation, 16GB is consumed, and the
remaining 16GB can be used by other processes. During hot-upgrade, we could
try to migrate the memory used by other processes to allocate the required
extra 16GB of Hugetlb. This might work, but it still requires reserving 32GB
of memory.

We also found that during the hot upgrade, about 10GB of the old process's
hugetlb is actually cold memory, which could theoretically be reclaimed. In
extreme cases, we could reserve only 22GB of memory and reclaim the
remaining 10GB during the hot upgrade. But unfortunately, hugetlb currently
does not support swap, and supporting it seems quite difficult.

Therefore, we are wondering if we can introduce "reserved THP", which is THP
that can be reserved. It can be consumed through methods like madvise(), while
normal memory allocation cannot consume it. This can achieve an effect similar
to hugetlb. And because it is THP, it can relatively easily support swap
features, which perfectly solves the above problem.

Additionally, in 2024 (or possibly earlier), there have been discussions about
the possibility of unifying Hugetlb and THP:

Link: https://lwn.net/Articles/974491/

After all, hugetlb's management is relatively independent and requires too
much special handling in mm core. The introduction of reserved THP might be
an opportunity. In the future, reserved THP could be enhanced to support
various hugetlb features, such as acting as a backend for hugetlbfs. When
reserved THP can completely replace HugeTLB, HugeTLB could be entirely
removed, and reserved THP would just become a feature of THP.

2. Implementation
=================

In 2024, Yu Zhao proposed a similar idea:

Link: https://lore.kernel.org/all/20240229183436.4110845-2-yuzhao@google.com/

The idea was to introduce two virt zones: ZONE_NOSPLIT and ZONE_NOMERGE to
guarantee the allocation success rate of THP, achieving an effect similar to
reservation. However, it seems there was no further progress, perhaps because of
reluctance to introduce more virt zones like ZONE_MOVABLE.

This RFC wants to discuss another implementation:

1. Introduce a new migratetype: MIGRATE_RESERVED_THP.
2. Introduce two new hugetlb-like kernel boot parameters: `thp_reserved_size`
   and `thp_reserved_nr`. When set, the required memory is marked as
   MIGRATE_RESERVED_THP and put back into the buddy allocator.
3. Introduce a new madvise parameter: `MADV_RESERVED_THP`. Pages marked as
   MIGRATE_RESERVED_THP can only be consumed via `madvise(MADV_RESERVED_THP)`.
   Other normal memory allocations cannot consume MIGRATE_RESERVED_THP memory.

This can achieve a reservation effect similar to HugeTLB and guarantee
allocation success.

3. Future Plans
===============

3.1 Enhance swap-out and swap-in for large folios
-------------------------------------------------

Currently, For swap-out, THP_SWAP is supported, but it only tries to swap out
the THP folio as a whole. It is still possible to be forced to split in some
situations (e.g., fragmented swap space, memory.swap.max limits, etc). For
swap-in, it is almost impossible to directly swap in the THP folio as a whole.

But for reserved THP, splitting is not allowed. We need to ensure that it
remains a whole huge page during swap-out and swap-in, to achieve a function
similar to hugetlb swap.


3.2 Integrate reserved THP into the common reclaim path
-------------------------------------------------------

Once swap-in and swap-out of huge pages can be supported without splitting,
reserved THP can be integrated into the common reclaim path as a normal LRU
folio for memory reclamation. This fills the gap of the hugetlb swap function.

3.3 Use reserved THP as a backend for shmem/tmpfs
-------------------------------------------------

This would allow shared or file-like usage to utilize reserved THP.

3.4 Use reserved THP as a backend for hugetlbfs
-----------------------------------------------

This would allow existing hugetlb users or applications to seamlessly switch to
reserved THP.

3.5 Add 1GB page support to reserved THP
----------------------------------------

Historically, there have been several attempts to add 1GB huge page support to
THP:

1. https://lore.kernel.org/linux-mm/20260202005451.774496-1-usamaarif642@gmail.com/
2. https://lore.kernel.org/linux-mm/20210224223536.803765-1-zi.yan@sent.com/

Adding 1GB huge page support for reserved THP would be relatively simpler
compared to regular THP.

3.6 Remove Hugetlb
------------------

Once reserved THP can completely replace the existing functions of hugetlb, we
can gradually remove Hugetlb, leaving only one huge page management system in
the kernel.

This series is based on the next-20260623.

Comments and feedback are welcome!

Thanks,
Qi

Qi Zheng (8):
  mm: page_alloc: add reserved THP pageblock type
  mm: add boot-time reserved THP pageblock capacity
  mm: page_alloc: add a reserved THP allocation primitive
  mm: add reserved THP quota helpers
  mm: add reserved THP vma flag
  mm: maintain reserved THP quota across VMA changes
  mm: support reserved THP VMAs in anonymous faults
  mm: add MADV_RESERVED_THP range policy

 arch/alpha/include/uapi/asm/mman.h     |   2 +
 arch/mips/include/uapi/asm/mman.h      |   2 +
 arch/parisc/include/uapi/asm/mman.h    |   2 +
 arch/xtensa/include/uapi/asm/mman.h    |   2 +
 fs/proc/task_mmu.c                     |   3 +
 include/linux/gfp.h                    |   3 +
 include/linux/gfp_types.h              |   8 +-
 include/linux/huge_mm.h                |   4 +-
 include/linux/mm.h                     |   7 ++
 include/linux/mmzone.h                 |  11 +-
 include/trace/events/mmflags.h         |   4 +-
 include/uapi/asm-generic/mman-common.h |   2 +
 mm/Makefile                            |   2 +-
 mm/huge_memory.c                       |  18 +++-
 mm/internal.h                          |   6 ++
 mm/khugepaged.c                        |   8 ++
 mm/madvise.c                           |  83 ++++++++++++++-
 mm/memory.c                            |   3 +
 mm/mmap.c                              |  18 ++++
 mm/mremap.c                            | 121 ++++++++++++++++------
 mm/page_alloc.c                        |  73 +++++++++++++-
 mm/reserved_thp.c                      | 133 +++++++++++++++++++++++++
 mm/show_mem.c                          |   5 +
 mm/vma.c                               |  23 +++++
 mm/vma.h                               |   1 +
 tools/include/linux/gfp_types.h        |   4 +-
 tools/perf/builtin-kmem.c              |   1 +
 tools/testing/vma/include/dup.h        |   1 +
 28 files changed, 499 insertions(+), 51 deletions(-)
 create mode 100644 mm/reserved_thp.c

-- 
2.54.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-29 12:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-27  7:21 [RFC PATCH 0/8] Introducte Reserved THP Qi Zheng
2026-06-27  7:21 ` [RFC PATCH 1/8] mm: page_alloc: add reserved THP pageblock type Qi Zheng
2026-06-27  7:21 ` [RFC PATCH 2/8] mm: add boot-time reserved THP pageblock capacity Qi Zheng
2026-06-27  7:21 ` [RFC PATCH 3/8] mm: page_alloc: add a reserved THP allocation primitive Qi Zheng
2026-06-27  7:21 ` [RFC PATCH 4/8] mm: add reserved THP quota helpers Qi Zheng
2026-06-27  7:21 ` [RFC PATCH 5/8] mm: add reserved THP vma flag Qi Zheng
2026-06-27  7:26 ` [RFC PATCH 6/8] mm: maintain reserved THP quota across VMA changes Qi Zheng
2026-06-27  7:26 ` [RFC PATCH 7/8] mm: support reserved THP VMAs in anonymous faults Qi Zheng
2026-06-27  7:26 ` [RFC PATCH 8/8] mm: add MADV_RESERVED_THP range policy Qi Zheng
2026-06-29  3:46 ` [RFC PATCH 0/8] Introducte Reserved THP Matthew Wilcox
2026-06-29 10:13   ` Qi Zheng
2026-06-29 12:20 ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox