public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] mm: Support selecting doing direct COW for anonymous pmd entry
@ 2026-05-01  5:55 Luka Bai
  2026-05-01  5:55 ` [PATCH 1/5] mm: add basic madvise helpers and branch for THP setup Luka Bai
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Luka Bai @ 2026-05-01  5:55 UTC (permalink / raw)
  To: linux-mm
  Cc: Jonathan Corbet, Shuah Khan, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Arnd Bergmann, Kairui Song, linux-kernel, linux-arch, linux-doc,
	Luka Bai

Copy on write support for anonymous pmd level THP is simple right now:
firstly we'll check whether the folio can be exclusively used by the
faulting process, if we can (when the ref of the folio is only 1 after
trying to free swapcache or the page flag AnonExclusive is setup) we'll
directly use it with few further handling. If we cannot, then we'll
split the pmd into 512 4K ptes, and do copy on write only for the
specific 4K page that we faulted on.

This logic is truly memory efficient since for most workloads we don't
want to allocate 2M new memory simply on a small write. However, it also
makes the original 2M page for the process suddenly splitted on a
write which will generate some performance thrashing. For example, if
process A and process B share an anonymous 2M pmd, if process B chooses
to do a writing, then its page table mapping will be changed from 1
pmd entry into 512 4K pte entries at once, so the tlb benifit will
suddenly just "vanish" for process B, which sometimes may cause a
observable performance degeneration. After that, we can only wait for
khugepaged to do the collapse for this area and merge the pmd back, which
is not easy to happen.

In addition to the problem above, this logic can also generate some
deficiency for THP itself. Currently THP is just a "best-effort" choice
with no "certainty". THP is easily splitted into multiple small pages
on common calling path like reclaiming, COW. A transparent splitting
can cause throughput fluctuation for some workloads. For these workloads,
we may want to give THP some "certainty" just like hugetlbfs, The effect
we want is: after some customized setup, if only the system has usable
folio, and the virtual memory alignment permits (or we setup to), we can
make sure we always use THP for it, the system will never split it except
the user wants to do so.

This patchset is about both two things above, firstly we add pmd level
THP COW support by revising the code in do_huge_pmd_wp_page, we added
switch for it because different workloads may need different resources,
for which memory saving may matter more rather than the 2M tlb gain.
The switch is very similar to the "enable" and "shmem_enable" in sysfs
path of transparent_hugepage. THP COW is only enabled when THP itself
is enabled globally or by madvise. And also, we add basic THP setup
helpers and branch in madvise path and add the THP COW choice to it for a
more fine-grained setup. Now the helpers only supports copy on write
related, but in the future we may be able to add more types of THP
configurations into it like swapping.

Patch Details:
========
* Patch 1 adds the basic THP setup helpers and branch in madvise path.
  Then we add THP COW parameter into it.
* Patch 2 adds the THP COW sysfs interface, the logic is very similar
  to enable and shmem_enable of THP.
* Patch 3 adds the helpers that will be used in the actual COW path
  to decide whether we choose to do pmd level THP COW.
* Patch 4 reconstructs map_anon_folio_pmd_nopf and map_anon_folio_pmd_pf
  to make it capable of doing mapping for copied new folio when the
  fault flag has FLAG_FAULT_UNSHARE.
* Patch 5 adds the actual support for pmd level THP COW, and uses all
  the switches and helpers in the above 4 patches to do the strategy
  control.

Thanks for reading. Comments and suggestions are very welcome!

Signed-off-by: Luka Bai <lukabai@tencent.com>
---
Luka Bai (5):
      mm: add basic madvise helpers and branch for THP setup
      mm: add pmd level THP COW parameter in sysfs
      mm: add pmd level THP COW judgement helpers
      mm: enable map_anon_folio_pmd_nopf to handle unshare
      mm: support choosing to do THP COW for anonymous pmd entry.

 .../testing/sysfs-kernel-mm-transparent-hugepage   |   1 +
 Documentation/admin-guide/mm/transhuge.rst         |  27 +++
 include/linux/huge_mm.h                            |  45 ++++-
 include/linux/mm.h                                 |  19 ++
 include/uapi/asm-generic/mman-common.h             |   9 +
 mm/huge_memory.c                                   | 198 ++++++++++++++++++---
 mm/khugepaged.c                                    |   8 +-
 mm/madvise.c                                       |  25 +++
 8 files changed, 308 insertions(+), 24 deletions(-)
---
base-commit: 41cd9e3d23b8fd9e6c3c0311e9cb0304442c6141
change-id: 20260501-thp_cow-94873ed30793

Best regards,
--  
Luka Bai <lukabai@tencent.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-03  7:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01  5:55 [PATCH 0/5] mm: Support selecting doing direct COW for anonymous pmd entry Luka Bai
2026-05-01  5:55 ` [PATCH 1/5] mm: add basic madvise helpers and branch for THP setup Luka Bai
2026-05-01  5:55 ` [PATCH 2/5] mm: add pmd level THP COW parameter in sysfs Luka Bai
2026-05-01  5:55 ` [PATCH 3/5] mm: add pmd level THP COW judgement helpers Luka Bai
2026-05-01  5:55 ` [PATCH 4/5] mm: enable map_anon_folio_pmd_nopf to handle unshare Luka Bai
2026-05-01  5:55 ` [PATCH 5/5] mm: support choosing to do THP COW for anonymous pmd entry Luka Bai
2026-05-01  7:11   ` David Hildenbrand (Arm)
2026-05-01 15:01     ` Luka Bai
2026-05-01  7:07 ` [PATCH 0/5] mm: Support selecting doing direct " David Hildenbrand (Arm)
2026-05-01 16:16   ` Luka Bai
2026-05-01 18:30     ` David Hildenbrand (Arm)
2026-05-02  5:06       ` Luka Bai
2026-05-03  7:03 ` [syzbot ci] " syzbot ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox