public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files
@ 2026-04-29 15:29 Zi Yan
  2026-04-29 15:29 ` [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
                   ` (14 more replies)
  0 siblings, 15 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-29 15:29 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Matthew Wilcox (Oracle),
	Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

I will be AFK in May most of the time, so my response might be delayed.

Hi all,

This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
file-backed THPs for FSes with large folio support (the supported orders
need to include PMD_ORDER) by default, including for writable files.
It is on top of mm-new.


Before the patchset, the status of creating read-only THPs is below:

                            |    PF     | MADV_COLLAPSE | khugepaged |
                            |-----------|---------------|------------|
 large folio FSes only      |     ✓     |       x       |      x     |
 READ_ONLY_THP_FOR_FS only  |     x     |       ✓       |      ✓     |
 both                       |     ✓     |       ✓       |      ✓     |

where READ_ONLY_THP_FOR_FS implies no large folio FSes.


Now without READ_ONLY_THP_FOR_FS:

                                  |    PF     | MADV_COLLAPSE | khugepaged |
                                  |-----------|---------------|------------|
 large folio FSes (read-only fd)  |     ✓     |       ✓       |      ✓     |
 large folio FSes (read-write fd) |     ✓     |       ✓       |      ✓*    |
 no large folio FSes              |     x     |       x       |      x     |

* khugepaged only collapses clean folios from writable files. Userspace
  must flush dirty folios explicitly before khugepaged can collapse them.
  MADV_COLLAPSE handles the flush automatically via its writeback-and-retry
  path. Collapsing writable MAP_PRIVATE pagecache folios is still not
  supported, since PMD THP CoW only faults in at PTE level to avoid long
  CoW latency, and file_backed_vma_is_retractable() prevents it.

This means no-large-folio FSes need to add large folio support (the
supported orders need to include PMD_ORDER), so that they can leverage
file THP creation.

To prevent breaking file THP support for large folio FSes,
1. first 4 patches enable the support, so that without READ_ONLY_THP_FOR_FS,
   file THP still works for large folio FSes,
2. Patch 5 removes READ_ONLY_THP_FOR_FS Kconfig,
3. patches 6-12 remove code related to READ_ONLY_THP_FOR_FS,
4. patches 13-14 enable clean pagecache folio collapse for writable files.


NOTE: collapsing writable MAP_PRIVATE pagecache folios is not supported,
since:
1. PMD THP CoW only faults in at PTE level to avoid long CoW latency,
2. the first check, due to 1, in file_backed_vma_is_retractable() prevents it.


Overview
===

1. collapse_file() checks for to-be-collapsed folio dirtiness after they
   are locked and unmapped to make sure no new write happens. Before,
   mapping->nr_thps and inode->i_writecount were used to cause read-only
   THP truncation before a fd becomes writable.

2. hugepage_enabled() is true for anon, shmem, and file-backed cases
   if the global khugepaged control is on, otherwise, khugepaged for
   file-backed case is turned off and anon and shmem depend on per-size
   control knobs.

3. collapse_file() from mm/khugepaged.c, instead of checking
   CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
   of struct address_space of the file is at least PMD_ORDER.

4. file_thp_enabled() checks mapping_max_folio_order() instead of
   CONFIG_READ_ONLY_THP_FOR_FS and no longer checks if the file is opened
   read-only. The dirty folio check after try_to_unmap() (Change 1)
   handles writable files correctly.

5. truncate_inode_partial_folio() calls folio_split() directly instead
   of the removed try_folio_split_to_order(), since large folios can
   only show up on a FS with large folio support.

6. nr_thps is removed from struct address_space, since it is no longer
   needed to drop all read-only THPs from a FS without large folio
   support when the fd becomes writable. Its related filemap_nr_thps*()
   are removed too.

7. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.

8. collapse_file() only calls filemap_flush() for read-only files.
   Blindly flushing dirty folios from writable files would cause
   undesirable system-wide writeback; userspace is expected to flush
   explicitly, or use MADV_COLLAPSE which handles it via its retry path.

9. Updated comments and selftests in various places.


Changelog
===
From V4[5]:
1. fixed Patch 1's compilation error in !CONFIG_TRANSPARENT_HUGEPAGE

2. changed Patch 3 to no longer enable collapse for read-write fd but only
   allowe read-only fd.

3. added two new patches to enable clean pagecache folio collapse for
   writable files:
   - Patch 13: remove inode_is_open_for_write() from file_thp_enabled()
     so that khugepaged and MADV_COLLAPSE can process writable files.
     filemap_flush() in collapse_file() is now conditionalized on the file
     being read-only, to avoid repeatedly writing back dirty folios from
     writable files.
   - Patch 14: add read_write_file_read_ops and read_write_file_write_ops
     to the khugepaged selftest to cover the new writable-file collapse paths.

From V3[4]:
1. added a TODO comment in patch 1 noting that the is_shmem exception in
   the VM_WARN_ON_ONCE() check can be removed once shmem always calls
   mapping_set_large_folios() on its mapping. Used VM_WARN_ON_ONCE() in
   mapping_pmd_thp_support() instead.

2. fixed the dirty folio bail-out path in patch 2: add xas_unlock_irq()
   and folio_putback_lru() before the goto, which were missing and would
   have left the XA lock held and the LRU isolation ref leaked.

3. renamed hugepage_pmd_enabled() to hugepage_enabled() to reflect it
   controls khugepaged for all transparent hugepage types.

4. reverted the comment in hugepage_enabled() in patch 4 to the original;
   only removed the phrase "when configured in," which referred to
   CONFIG_READ_ONLY_THP_FOR_FS.

5. fixed commit message in patch 6: the dirty folio check is added after
   try_to_unmap() in collapse_file(), not after try_to_unmap_flush().

From V2[3]:
1. removed unnecessary check in collapse_scan_file().

2. removed inode_is_open_for_write() check in file_thp_enabled().

3. changed hugepage_enabled() to return true if khugepaged global
   control is on instead of false. cleaned up anon and shmem code in the
   function.

4. moved folio dirtiness check after try_to_unmap() but before
   try_to_unmap_flush(), since that is sufficient to prevent new writes.

5. reordered patch 4 and 5, so that khugepaged behavior does not change
   after READ_ONLY_THP_FOR_FS is removed.

6. added read-write file test in khugepaged selftest.

7. removed the read-only file restriction from guard-region selftest.

From V1[2]:
1. removed inode_is_open_for_write() check in collapse_file(), since the
   added folio dirtiness check after try_to_unmap_flush() should be
   sufficient to prevent writes to candidate folios.

2. removed READ_ONLY_THP_FOR_FS check in hugepage_enabled(), please
   see Patch 5 and item 2 in the overview for more details.

3. moved the patch removing READ_ONLY_THP_FOR_FS Kconfig after enabling
   khugepaged and MADV_COLLAPSE to create read-only THPs.

4. added mapping_pmd_thp_support() helper function.

5. used VM_WARN_ON_ONCE() in collapse_file() for mapping eligibility check
   and address alignment check instead of if + return error code. Always
   allow shmem, since MADV_COLLAPSE ignore shmem huge config.

6. added mapping eligibility check in collapse_scan_file().

7. removed trailing ; for folio_split() in the !CONFIG_TRANSPARENT_HUGEPAGE.

8. simplified code in folio_check_splittable() after removing
   READ_ONLY_THP_FOR_FS code.

9. clarified that read-only THP works for FSes with PMD THP support by
   default.

From RFC[1]:
1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
   on by default for all FSes with large folio support and the supported
   orders includes PMD_ORDER.

Suggestions and comments are welcome.

Link: https://lore.kernel.org/all/20260323190644.1714379-1-ziy@nvidia.com/ [1]
Link: https://lore.kernel.org/all/20260327014255.2058916-1-ziy@nvidia.com/ [2]
Link: https://lore.kernel.org/all/20260413192030.3275825-1-ziy@nvidia.com/ [3]
Link: https://lore.kernel.org/all/20260418024429.4055056-1-ziy@nvidia.com/ [4]
Link: https://lore.kernel.org/all/20260424024915.28758-1-ziy@nvidia.com/ [5]

Zi Yan (14):
  mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  mm/khugepaged: add folio dirty check after try_to_unmap()
  mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled()
  mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  mm: fs: remove filemap_nr_thps*() functions and their users
  fs: remove nr_thps from struct address_space
  mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  mm/truncate: use folio_split() in truncate_inode_partial_folio()
  fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
  selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
  selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions
  mm/khugepaged: enable clean pagecache folio collapse for writable
    files
  selftests/mm: add writable-file collapse tests for khugepaged

 fs/btrfs/defrag.c                          |   3 -
 fs/inode.c                                 |   3 -
 fs/open.c                                  |  27 ---
 include/linux/fs.h                         |   5 -
 include/linux/huge_mm.h                    |  25 +--
 include/linux/pagemap.h                    |  49 +++---
 include/linux/shmem_fs.h                   |   2 +-
 mm/Kconfig                                 |  11 --
 mm/filemap.c                               |   1 -
 mm/huge_memory.c                           |  39 +----
 mm/khugepaged.c                            | 101 ++++++-----
 mm/truncate.c                              |   8 +-
 tools/testing/selftests/mm/guard-regions.c |  18 +-
 tools/testing/selftests/mm/khugepaged.c    | 190 ++++++++++++++++-----
 tools/testing/selftests/mm/run_vmtests.sh  |  12 +-
 15 files changed, 258 insertions(+), 236 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-05-06  5:24 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 15:29 [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Zi Yan
2026-04-29 15:29 ` [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-30 14:37   ` Zi Yan
2026-04-30 15:04     ` Andrew Morton
2026-05-04  3:48   ` Nico Pache
2026-04-29 15:29 ` [PATCH v5 02/14] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
2026-04-30 15:11   ` Zi Yan
2026-05-04  3:53   ` Nico Pache
2026-05-06  5:23   ` Lance Yang
2026-04-29 15:29 ` [PATCH v5 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-05-04  3:57   ` Nico Pache
2026-04-29 15:29 ` [PATCH v5 04/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled() Zi Yan
2026-05-04  4:00   ` Nico Pache
2026-04-29 15:35 ` [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-05-04  4:02   ` Nico Pache
2026-04-29 15:35 ` [PATCH v5 06/14] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-29 15:35 ` [PATCH v5 07/14] fs: remove nr_thps from struct address_space Zi Yan
2026-05-04  4:11   ` Nico Pache
2026-04-29 15:35 ` [PATCH v5 08/14] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-29 15:35 ` [PATCH v5 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-30 15:12   ` Zi Yan
2026-04-29 15:35 ` [PATCH v5 10/14] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-29 15:35 ` [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-30 15:16   ` Zi Yan
2026-04-30 15:27     ` Zi Yan
2026-05-04  4:23   ` Nico Pache
2026-05-04 10:11   ` Nico Pache
2026-04-29 15:35 ` [PATCH v5 12/14] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-29 15:35 ` [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files Zi Yan
2026-04-30 15:18   ` Zi Yan
2026-04-29 15:35 ` [PATCH v5 14/14] selftests/mm: add writable-file collapse tests for khugepaged Zi Yan
2026-04-29 16:13 ` [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox