[PATCH v1 00/10] Remove READ_ONLY_THP_FOR

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig
@ 2026-03-27  1:42 Zi Yan
  2026-03-27  1:42 ` [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
                   ` (10 more replies)
  0 siblings, 11 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Hi all,

This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
read-only THPs for FSes with large folio support (the supported orders
need to include PMD_ORDER) by default.

The changes are:
1. collapse_file() from mm/khugepaged.c, instead of checking
   CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
   of struct address_space of the file is at least PMD_ORDER.
2. file_thp_enabled() also checks mapping_max_folio_order() instead.
3. truncate_inode_partial_folio() calls folio_split() directly instead
   of the removed try_folio_split_to_order(), since large folios can
   only show up on a FS with large folio support.
4. nr_thps is removed from struct address_space, since it is no longer
   needed to drop all read-only THPs from a FS without large folio
   support when the fd becomes writable. Its related filemap_nr_thps*()
   are removed too.
5. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.
6. Updated comments in various places.

Changelog
===
From RFC[1]:
1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
   on by default for all FSes with large folio support and the supported
   orders includes PMD_ORDER.

Suggestions and comments are welcome.

Link: https://lore.kernel.org/all/20260323190644.1714379-1-ziy@nvidia.com/ [1]

Zi Yan (10):
  mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  mm: fs: remove filemap_nr_thps*() functions and their users
  fs: remove nr_thps from struct address_space
  mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  mm/truncate: use folio_split() in truncate_inode_partial_folio()
  fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
  selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
  selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in
    guard-regions

 fs/btrfs/defrag.c                          |  3 --
 fs/inode.c                                 |  3 --
 fs/open.c                                  | 27 ----------------
 include/linux/fs.h                         |  5 ---
 include/linux/huge_mm.h                    | 25 ++-------------
 include/linux/pagemap.h                    | 29 -----------------
 mm/Kconfig                                 | 11 -------
 mm/filemap.c                               |  1 -
 mm/huge_memory.c                           | 29 ++---------------
 mm/khugepaged.c                            | 36 +++++-----------------
 mm/truncate.c                              |  8 ++---
 tools/testing/selftests/mm/guard-regions.c |  9 +++---
 tools/testing/selftests/mm/khugepaged.c    |  4 +--
 13 files changed, 23 insertions(+), 167 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27 11:45   ` Lorenzo Stoakes (Oracle)
  2026-03-27 13:33   ` David Hildenbrand (Arm)
  2026-03-27  1:42 ` [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

No one will be able to use it, so the related code can be removed in the
coming commits.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/Kconfig | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index bd283958d675..408fc7b82233 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -937,17 +937,6 @@ config THP_SWAP
 
 	  For selection by architectures with reasonable THP sizes.
 
-config READ_ONLY_THP_FOR_FS
-	bool "Read-only THP for filesystems (EXPERIMENTAL)"
-	depends on TRANSPARENT_HUGEPAGE
-
-	help
-	  Allow khugepaged to put read-only file-backed pages in THP.
-
-	  This is marked experimental because it is a new feature. Write
-	  support of file THPs will be developed in the next few release
-	  cycles.
-
 config NO_PAGE_MAPCOUNT
 	bool "No per-page mapcount (EXPERIMENTAL)"
 	help
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
  2026-03-27  1:42 ` [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27  7:29   ` Lance Yang
                     ` (3 more replies)
  2026-03-27  1:42 ` [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
                   ` (8 subsequent siblings)
  10 siblings, 4 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

collapse_file() requires FSes supporting large folio with at least
PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
huge option turned on also sets large folio order on mapping, so the check
also applies to shmem.

While at it, replace VM_BUG_ON with returning failure values.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/khugepaged.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d06d84219e1b..45b12ffb1550 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 	int nr_none = 0;
 	bool is_shmem = shmem_file(file);
 
-	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
-	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
+	/* "huge" shmem sets mapping folio order and passes the check below */
+	if (mapping_max_folio_order(mapping) < PMD_ORDER)
+		return SCAN_FAIL;
+	if (start & (HPAGE_PMD_NR - 1))
+		return SCAN_ADDRESS_RANGE;
 
 	result = alloc_charge_folio(&new_folio, mm, cc);
 	if (result != SCAN_SUCCEED)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
  2026-03-27  1:42 ` [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
  2026-03-27  1:42 ` [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27  9:32   ` Lance Yang
  2026-03-27 12:23   ` Lorenzo Stoakes (Oracle)
  2026-03-27  1:42 ` [PATCH v1 04/10] fs: remove nr_thps from struct address_space Zi Yan
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
large folio support, so that read-only THPs created in these FSes are not
seen by the FSes when the underlying fd becomes writable. Now read-only PMD
THPs only appear in a FS with large folio support and the supported orders
include PMD_ORDRE.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 fs/open.c               | 27 ---------------------------
 include/linux/pagemap.h | 29 -----------------------------
 mm/filemap.c            |  1 -
 mm/huge_memory.c        |  1 -
 mm/khugepaged.c         | 29 ++---------------------------
 5 files changed, 2 insertions(+), 85 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 91f1139591ab..cef382d9d8b8 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -970,33 +970,6 @@ static int do_dentry_open(struct file *f,
 	if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT))
 		return -EINVAL;
 
-	/*
-	 * XXX: Huge page cache doesn't support writing yet. Drop all page
-	 * cache for this file before processing writes.
-	 */
-	if (f->f_mode & FMODE_WRITE) {
-		/*
-		 * Depends on full fence from get_write_access() to synchronize
-		 * against collapse_file() regarding i_writecount and nr_thps
-		 * updates. Ensures subsequent insertion of THPs into the page
-		 * cache will fail.
-		 */
-		if (filemap_nr_thps(inode->i_mapping)) {
-			struct address_space *mapping = inode->i_mapping;
-
-			filemap_invalidate_lock(inode->i_mapping);
-			/*
-			 * unmap_mapping_range just need to be called once
-			 * here, because the private pages is not need to be
-			 * unmapped mapping (e.g. data segment of dynamic
-			 * shared libraries here).
-			 */
-			unmap_mapping_range(mapping, 0, 0, 0);
-			truncate_inode_pages(mapping, 0);
-			filemap_invalidate_unlock(inode->i_mapping);
-		}
-	}
-
 	return 0;
 
 cleanup_all:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ec442af3f886..dad3f8846cdc 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -530,35 +530,6 @@ static inline size_t mapping_max_folio_size(const struct address_space *mapping)
 	return PAGE_SIZE << mapping_max_folio_order(mapping);
 }
 
-static inline int filemap_nr_thps(const struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	return atomic_read(&mapping->nr_thps);
-#else
-	return 0;
-#endif
-}
-
-static inline void filemap_nr_thps_inc(struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	if (!mapping_large_folio_support(mapping))
-		atomic_inc(&mapping->nr_thps);
-#else
-	WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
-#endif
-}
-
-static inline void filemap_nr_thps_dec(struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	if (!mapping_large_folio_support(mapping))
-		atomic_dec(&mapping->nr_thps);
-#else
-	WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
-#endif
-}
-
 struct address_space *folio_mapping(const struct folio *folio);
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index 2b933a1da9bd..4248e7cdecf3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -189,7 +189,6 @@ static void filemap_unaccount_folio(struct address_space *mapping,
 			lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
 	} else if (folio_test_pmd_mappable(folio)) {
 		lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
-		filemap_nr_thps_dec(mapping);
 	}
 	if (test_bit(AS_KERNEL_FILE, &folio->mapping->flags))
 		mod_node_page_state(folio_pgdat(folio),
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b2a6060b3c20..c7873dbdc470 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3833,7 +3833,6 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 				} else {
 					lruvec_stat_mod_folio(folio,
 							NR_FILE_THPS, -nr);
-					filemap_nr_thps_dec(mapping);
 				}
 			}
 		}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 45b12ffb1550..8004ab8de6d2 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2104,20 +2104,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		goto xa_unlocked;
 	}
 
-	if (!is_shmem) {
-		filemap_nr_thps_inc(mapping);
-		/*
-		 * Paired with the fence in do_dentry_open() -> get_write_access()
-		 * to ensure i_writecount is up to date and the update to nr_thps
-		 * is visible. Ensures the page cache will be truncated if the
-		 * file is opened writable.
-		 */
-		smp_mb();
-		if (inode_is_open_for_write(mapping->host)) {
-			result = SCAN_FAIL;
-			filemap_nr_thps_dec(mapping);
-		}
-	}
+	if (!is_shmem && inode_is_open_for_write(mapping->host))
+		result = SCAN_FAIL;
 
 xa_locked:
 	xas_unlock_irq(&xas);
@@ -2296,19 +2284,6 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		folio_putback_lru(folio);
 		folio_put(folio);
 	}
-	/*
-	 * Undo the updates of filemap_nr_thps_inc for non-SHMEM
-	 * file only. This undo is not needed unless failure is
-	 * due to SCAN_COPY_MC.
-	 */
-	if (!is_shmem && result == SCAN_COPY_MC) {
-		filemap_nr_thps_dec(mapping);
-		/*
-		 * Paired with the fence in do_dentry_open() -> get_write_access()
-		 * to ensure the update to nr_thps is visible.
-		 */
-		smp_mb();
-	}
 
 	new_folio->mapping = NULL;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 04/10] fs: remove nr_thps from struct address_space
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
                   ` (2 preceding siblings ...)
  2026-03-27  1:42 ` [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27 12:29   ` Lorenzo Stoakes (Oracle)
  2026-03-27 14:00   ` David Hildenbrand (Arm)
  2026-03-27  1:42 ` [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

filemap_nr_thps*() are removed, the related field, address_space->nr_thps,
is no longer needed. Remove it.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 fs/inode.c         | 3 ---
 include/linux/fs.h | 5 -----
 2 files changed, 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index cc12b68e021b..16ab0a345419 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -280,9 +280,6 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
 	mapping->flags = 0;
 	mapping->wb_err = 0;
 	atomic_set(&mapping->i_mmap_writable, 0);
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	atomic_set(&mapping->nr_thps, 0);
-#endif
 	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
 	mapping->i_private_data = NULL;
 	mapping->writeback_index = 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0bdccfa70b44..35875696fb4c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -455,7 +455,6 @@ extern const struct address_space_operations empty_aops;
  *   memory mappings.
  * @gfp_mask: Memory allocation flags to use for allocating pages.
  * @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
- * @nr_thps: Number of THPs in the pagecache (non-shmem only).
  * @i_mmap: Tree of private and shared mappings.
  * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
  * @nrpages: Number of page entries, protected by the i_pages lock.
@@ -473,10 +472,6 @@ struct address_space {
 	struct rw_semaphore	invalidate_lock;
 	gfp_t			gfp_mask;
 	atomic_t		i_mmap_writable;
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	/* number of thp, only for non-shmem files */
-	atomic_t		nr_thps;
-#endif
 	struct rb_root_cached	i_mmap;
 	unsigned long		nrpages;
 	pgoff_t			writeback_index;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
                   ` (3 preceding siblings ...)
  2026-03-27  1:42 ` [PATCH v1 04/10] fs: remove nr_thps from struct address_space Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27 12:42   ` Lorenzo Stoakes (Oracle)
  2026-03-27  1:42 ` [PATCH v1 06/10] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Replace it with a check on the max folio order of the file's address space
mapping, making sure PMD_ORDER is supported.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c7873dbdc470..1da1467328a3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 {
 	struct inode *inode;
 
-	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
-		return false;
-
 	if (!vma->vm_file)
 		return false;
 
@@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 	if (IS_ANON_FILE(inode))
 		return false;
 
+	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
+		return false;
+
 	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 06/10] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
                   ` (4 preceding siblings ...)
  2026-03-27  1:42 ` [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27 12:50   ` Lorenzo Stoakes (Oracle)
  2026-03-27  1:42 ` [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Without READ_ONLY_THP_FOR_FS, large file-backed folios cannot be created by
a FS without large folio support. The check is no longer needed.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1da1467328a3..30eddcbf86f1 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3732,28 +3732,6 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
 		/* order-1 is not supported for anonymous THP. */
 		if (new_order == 1)
 			return -EINVAL;
-	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
-		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
-		    !mapping_large_folio_support(folio->mapping)) {
-			/*
-			 * We can always split a folio down to a single page
-			 * (new_order == 0) uniformly.
-			 *
-			 * For any other scenario
-			 *   a) uniform split targeting a large folio
-			 *      (new_order > 0)
-			 *   b) any non-uniform split
-			 * we must confirm that the file system supports large
-			 * folios.
-			 *
-			 * Note that we might still have THPs in such
-			 * mappings, which is created from khugepaged when
-			 * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
-			 * case, the mapping does not actually support large
-			 * folios properly.
-			 */
-			return -EINVAL;
-		}
 	}
 
 	/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio()
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
                   ` (5 preceding siblings ...)
  2026-03-27  1:42 ` [PATCH v1 06/10] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27  3:33   ` Lance Yang
  2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
  2026-03-27  1:42 ` [PATCH v1 08/10] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
not. folio_split() can be used on a FS with large folio support without
worrying about getting a THP on a FS without large folio support.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/huge_mm.h | 25 ++-----------------------
 mm/truncate.c           |  8 ++++----
 2 files changed, 6 insertions(+), 27 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 1258fa37e85b..171de8138e98 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -389,27 +389,6 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
 	return split_huge_page_to_list_to_order(page, NULL, new_order);
 }
 
-/**
- * try_folio_split_to_order() - try to split a @folio at @page to @new_order
- * using non uniform split.
- * @folio: folio to be split
- * @page: split to @new_order at the given page
- * @new_order: the target split order
- *
- * Try to split a @folio at @page using non uniform split to @new_order, if
- * non uniform split is not supported, fall back to uniform split. After-split
- * folios are put back to LRU list. Use min_order_for_split() to get the lower
- * bound of @new_order.
- *
- * Return: 0 - split is successful, otherwise split failed.
- */
-static inline int try_folio_split_to_order(struct folio *folio,
-		struct page *page, unsigned int new_order)
-{
-	if (folio_check_splittable(folio, new_order, SPLIT_TYPE_NON_UNIFORM))
-		return split_huge_page_to_order(&folio->page, new_order);
-	return folio_split(folio, new_order, page, NULL);
-}
 static inline int split_huge_page(struct page *page)
 {
 	return split_huge_page_to_list_to_order(page, NULL, 0);
@@ -641,8 +620,8 @@ static inline int split_folio_to_list(struct folio *folio, struct list_head *lis
 	return -EINVAL;
 }
 
-static inline int try_folio_split_to_order(struct folio *folio,
-		struct page *page, unsigned int new_order)
+static inline int folio_split(struct folio *folio, unsigned int new_order,
+		struct page *page, struct list_head *list);
 {
 	VM_WARN_ON_ONCE_FOLIO(1, folio);
 	return -EINVAL;
diff --git a/mm/truncate.c b/mm/truncate.c
index 2931d66c16d0..6973b05ec4b8 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -177,7 +177,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
 	return 0;
 }
 
-static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
+static int folio_split_or_unmap(struct folio *folio, struct page *split_at,
 				    unsigned long min_order)
 {
 	enum ttu_flags ttu_flags =
@@ -186,7 +186,7 @@ static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
 		TTU_IGNORE_MLOCK;
 	int ret;
 
-	ret = try_folio_split_to_order(folio, split_at, min_order);
+	ret = folio_split(folio, min_order, split_at, NULL);
 
 	/*
 	 * If the split fails, unmap the folio, so it will be refaulted
@@ -252,7 +252,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 
 	min_order = mapping_min_folio_order(folio->mapping);
 	split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
-	if (!try_folio_split_or_unmap(folio, split_at, min_order)) {
+	if (!folio_split_or_unmap(folio, split_at, min_order)) {
 		/*
 		 * try to split at offset + length to make sure folios within
 		 * the range can be dropped, especially to avoid memory waste
@@ -279,7 +279,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 		/* make sure folio2 is large and does not change its mapping */
 		if (folio_test_large(folio2) &&
 		    folio2->mapping == folio->mapping)
-			try_folio_split_or_unmap(folio2, split_at2, min_order);
+			folio_split_or_unmap(folio2, split_at2, min_order);
 
 		folio_unlock(folio2);
 out:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 08/10] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
                   ` (6 preceding siblings ...)
  2026-03-27  1:42 ` [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
  2026-03-27  1:42 ` [PATCH v1 09/10] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

READ_ONLY_THP_FOR_FS is no longer present, remove related comment.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/defrag.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index 7e2db5d3a4d4..a8d49d9ca981 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -860,9 +860,6 @@ static struct folio *defrag_prepare_one_folio(struct btrfs_inode *inode, pgoff_t
 		return folio;
 
 	/*
-	 * Since we can defragment files opened read-only, we can encounter
-	 * transparent huge pages here (see CONFIG_READ_ONLY_THP_FOR_FS).
-	 *
 	 * The IO for such large folios is not fully tested, thus return
 	 * an error to reject such folios unless it's an experimental build.
 	 *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 09/10] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
                   ` (7 preceding siblings ...)
  2026-03-27  1:42 ` [PATCH v1 08/10] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
  2026-03-27  1:42 ` [PATCH v1 10/10] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
  2026-03-27 13:46 ` [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig David Hildenbrand (Arm)
  10 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Change the requirement to a file system with large folio support and the
supported order needs to include PMD_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 tools/testing/selftests/mm/khugepaged.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 3fe7ef04ac62..bdcdd31beb1e 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -1086,8 +1086,8 @@ static void usage(void)
 	fprintf(stderr, "\t<context>\t: [all|khugepaged|madvise]\n");
 	fprintf(stderr, "\t<mem_type>\t: [all|anon|file|shmem]\n");
 	fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n");
-	fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n");
-	fprintf(stderr,	"\tCONFIG_READ_ONLY_THP_FOR_FS=y\n");
+	fprintf(stderr, "\n\t\"file,all\" mem_type requires a file system\n");
+	fprintf(stderr,	"\twith large folio support (order >= PMD order)\n");
 	fprintf(stderr, "\n\tif [dir] is a (sub)directory of a tmpfs mount, tmpfs must be\n");
 	fprintf(stderr,	"\tmounted with huge=advise option for khugepaged tests to work\n");
 	fprintf(stderr,	"\n\tSupported Options:\n");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v1 10/10] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
                   ` (8 preceding siblings ...)
  2026-03-27  1:42 ` [PATCH v1 09/10] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
@ 2026-03-27  1:42 ` Zi Yan
  2026-03-27 13:06   ` Lorenzo Stoakes (Oracle)
  2026-03-27 13:46 ` [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig David Hildenbrand (Arm)
  10 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27  1:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Any file system with large folio support and the supported orders include
PMD_ORDER can be used.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 tools/testing/selftests/mm/guard-regions.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
index 48e8b1539be3..13e77e48b6ef 100644
--- a/tools/testing/selftests/mm/guard-regions.c
+++ b/tools/testing/selftests/mm/guard-regions.c
@@ -2205,7 +2205,7 @@ TEST_F(guard_regions, collapse)
 
 	/*
 	 * We must close and re-open local-file backed as read-only for
-	 * CONFIG_READ_ONLY_THP_FOR_FS to work.
+	 * MADV_COLLAPSE to work.
 	 */
 	if (variant->backing == LOCAL_FILE_BACKED) {
 		ASSERT_EQ(close(self->fd), 0);
@@ -2237,9 +2237,10 @@ TEST_F(guard_regions, collapse)
 	/*
 	 * Now collapse the entire region. This should fail in all cases.
 	 *
-	 * The madvise() call will also fail if CONFIG_READ_ONLY_THP_FOR_FS is
-	 * not set for the local file case, but we can't differentiate whether
-	 * this occurred or if the collapse was rightly rejected.
+	 * The madvise() call will also fail if the file system does not support
+	 * large folio or the supported orders do not include PMD_ORDER for the
+	 * local file case, but we can't differentiate whether this occurred or
+	 * if the collapse was rightly rejected.
 	 */
 	EXPECT_NE(madvise(ptr, size, MADV_COLLAPSE), 0);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio()
  2026-03-27  1:42 ` [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
@ 2026-03-27  3:33   ` Lance Yang
  2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 55+ messages in thread
From: Lance Yang @ 2026-03-27  3:33 UTC (permalink / raw)
  To: Zi Yan
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Baolin Wang, Matthew Wilcox (Oracle), Liam R. Howlett, Nico Pache,
	Song Liu, Ryan Roberts, Dev Jain, Barry Song, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest



On 2026/3/27 09:42, Zi Yan wrote:
> After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
> not. folio_split() can be used on a FS with large folio support without
> worrying about getting a THP on a FS without large folio support.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   include/linux/huge_mm.h | 25 ++-----------------------
>   mm/truncate.c           |  8 ++++----
>   2 files changed, 6 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 1258fa37e85b..171de8138e98 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -389,27 +389,6 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>   	return split_huge_page_to_list_to_order(page, NULL, new_order);
>   }
>   
> -/**
> - * try_folio_split_to_order() - try to split a @folio at @page to @new_order
> - * using non uniform split.
> - * @folio: folio to be split
> - * @page: split to @new_order at the given page
> - * @new_order: the target split order
> - *
> - * Try to split a @folio at @page using non uniform split to @new_order, if
> - * non uniform split is not supported, fall back to uniform split. After-split
> - * folios are put back to LRU list. Use min_order_for_split() to get the lower
> - * bound of @new_order.
> - *
> - * Return: 0 - split is successful, otherwise split failed.
> - */
> -static inline int try_folio_split_to_order(struct folio *folio,
> -		struct page *page, unsigned int new_order)
> -{
> -	if (folio_check_splittable(folio, new_order, SPLIT_TYPE_NON_UNIFORM))
> -		return split_huge_page_to_order(&folio->page, new_order);
> -	return folio_split(folio, new_order, page, NULL);
> -}
>   static inline int split_huge_page(struct page *page)
>   {
>   	return split_huge_page_to_list_to_order(page, NULL, 0);
> @@ -641,8 +620,8 @@ static inline int split_folio_to_list(struct folio *folio, struct list_head *lis
>   	return -EINVAL;
>   }
>   
> -static inline int try_folio_split_to_order(struct folio *folio,
> -		struct page *page, unsigned int new_order)
> +static inline int folio_split(struct folio *folio, unsigned int new_order,
> +		struct page *page, struct list_head *list);

Ouch, that ';' wasn't supposed to be there, right?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27  1:42 ` [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
@ 2026-03-27  7:29   ` Lance Yang
  2026-03-27  7:35     ` Lance Yang
  2026-03-27  9:44   ` Baolin Wang
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 55+ messages in thread
From: Lance Yang @ 2026-03-27  7:29 UTC (permalink / raw)
  To: ziy
  Cc: willy, songliubraving, clm, dsterba, viro, brauner, jack, akpm,
	david, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, vbabka, rppt, surenb, mhocko, shuah,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest


On Thu, Mar 26, 2026 at 09:42:47PM -0400, Zi Yan wrote:
>collapse_file() requires FSes supporting large folio with at least
>PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>huge option turned on also sets large folio order on mapping, so the check
>also applies to shmem.
>
>While at it, replace VM_BUG_ON with returning failure values.
>
>Signed-off-by: Zi Yan <ziy@nvidia.com>
>---
> mm/khugepaged.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index d06d84219e1b..45b12ffb1550 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> 	int nr_none = 0;
> 	bool is_shmem = shmem_file(file);
> 
>-	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>-	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>+	/* "huge" shmem sets mapping folio order and passes the check below */
>+	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>+		return SCAN_FAIL;

Yep, for shmem inodes, if the mount has huge= enabled, inode creation
marks the mapping are large-folio capable:

	/* Don't consider 'deny' for emergencies and 'force' for testing */
	if (sbinfo->huge)
		mapping_set_large_folios(inode->i_mapping);

LGTM!

Reviewed-by: Lance Yang <lance.yang@linux.dev>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27  7:29   ` Lance Yang
@ 2026-03-27  7:35     ` Lance Yang
  0 siblings, 0 replies; 55+ messages in thread
From: Lance Yang @ 2026-03-27  7:35 UTC (permalink / raw)
  To: ziy
  Cc: willy, songliubraving, clm, dsterba, viro, brauner, jack, akpm,
	david, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, vbabka, rppt, surenb, mhocko, shuah,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest



On 2026/3/27 15:29, Lance Yang wrote:
> 
> On Thu, Mar 26, 2026 at 09:42:47PM -0400, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>> huge option turned on also sets large folio order on mapping, so the check
>> also applies to shmem.
>>
>> While at it, replace VM_BUG_ON with returning failure values.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> mm/khugepaged.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d06d84219e1b..45b12ffb1550 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>> 	int nr_none = 0;
>> 	bool is_shmem = shmem_file(file);
>>
>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>> +		return SCAN_FAIL;
> 
> Yep, for shmem inodes, if the mount has huge= enabled, inode creation
> marks the mapping are large-folio capable:

Oops, s/are/as/

> 
> 	/* Don't consider 'deny' for emergencies and 'force' for testing */
> 	if (sbinfo->huge)
> 		mapping_set_large_folios(inode->i_mapping);
> 
> LGTM!
> 
> Reviewed-by: Lance Yang <lance.yang@linux.dev>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-03-27  1:42 ` [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
@ 2026-03-27  9:32   ` Lance Yang
  2026-03-27 12:23   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 55+ messages in thread
From: Lance Yang @ 2026-03-27  9:32 UTC (permalink / raw)
  To: Zi Yan
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Baolin Wang, Song Liu, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest,
	Matthew Wilcox (Oracle)



On 2026/3/27 09:42, Zi Yan wrote:
> They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
> large folio support, so that read-only THPs created in these FSes are not
> seen by the FSes when the underlying fd becomes writable. Now read-only PMD
> THPs only appear in a FS with large folio support and the supported orders
> include PMD_ORDRE.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---

LGTM, feel free to add:

Reviewed-by: Lance Yang <lance.yang@linux.dev>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27  1:42 ` [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
  2026-03-27  7:29   ` Lance Yang
@ 2026-03-27  9:44   ` Baolin Wang
  2026-03-27 12:02     ` Lorenzo Stoakes (Oracle)
  2026-03-27 12:07   ` Lorenzo Stoakes (Oracle)
  2026-03-27 13:37   ` David Hildenbrand (Arm)
  3 siblings, 1 reply; 55+ messages in thread
From: Baolin Wang @ 2026-03-27  9:44 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest



On 3/27/26 9:42 AM, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> huge option turned on also sets large folio order on mapping, so the check
> also applies to shmem.
> 
> While at it, replace VM_BUG_ON with returning failure values.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   mm/khugepaged.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d06d84219e1b..45b12ffb1550 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>   	int nr_none = 0;
>   	bool is_shmem = shmem_file(file);
>   
> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> +	/* "huge" shmem sets mapping folio order and passes the check below */
> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> +		return SCAN_FAIL;

This is not true for anonymous shmem, since its large order allocation 
logic is similar to anonymous memory. That means it will not call 
mapping_set_large_folios() for anonymous shmem.

So I think the check should be:

if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
      return SCAN_FAIL;

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  2026-03-27  1:42 ` [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
@ 2026-03-27 11:45   ` Lorenzo Stoakes (Oracle)
  2026-03-27 13:33   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 11:45 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:46PM -0400, Zi Yan wrote:
> No one will be able to use it, so the related code can be removed in the
> coming commits.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Seems a reasonable ordering, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/Kconfig | 11 -----------
>  1 file changed, 11 deletions(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index bd283958d675..408fc7b82233 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -937,17 +937,6 @@ config THP_SWAP
>
>  	  For selection by architectures with reasonable THP sizes.
>
> -config READ_ONLY_THP_FOR_FS
> -	bool "Read-only THP for filesystems (EXPERIMENTAL)"
> -	depends on TRANSPARENT_HUGEPAGE
> -
> -	help
> -	  Allow khugepaged to put read-only file-backed pages in THP.
> -
> -	  This is marked experimental because it is a new feature. Write
> -	  support of file THPs will be developed in the next few release
> -	  cycles.
> -
>  config NO_PAGE_MAPCOUNT
>  	bool "No per-page mapcount (EXPERIMENTAL)"
>  	help
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27  9:44   ` Baolin Wang
@ 2026-03-27 12:02     ` Lorenzo Stoakes (Oracle)
  2026-03-27 13:45       ` Baolin Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 12:02 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Zi Yan, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	Andrew Morton, David Hildenbrand, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>
>
> On 3/27/26 9:42 AM, Zi Yan wrote:
> > collapse_file() requires FSes supporting large folio with at least
> > PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> > huge option turned on also sets large folio order on mapping, so the check
> > also applies to shmem.
> >
> > While at it, replace VM_BUG_ON with returning failure values.
> >
> > Signed-off-by: Zi Yan <ziy@nvidia.com>
> > ---
> >   mm/khugepaged.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index d06d84219e1b..45b12ffb1550 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> >   	int nr_none = 0;
> >   	bool is_shmem = shmem_file(file);
> > -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> > -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> > +	/* "huge" shmem sets mapping folio order and passes the check below */
> > +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> > +		return SCAN_FAIL;
>
> This is not true for anonymous shmem, since its large order allocation logic
> is similar to anonymous memory. That means it will not call
> mapping_set_large_folios() for anonymous shmem.
>
> So I think the check should be:
>
> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>      return SCAN_FAIL;

Hmm but in shmem_init() we have:

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
	else
		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */

	/*
	 * Default to setting PMD-sized THP to inherit the global setting and
	 * disable all other multi-size THPs.
	 */
	if (!shmem_orders_configured)
		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
#endif

And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
shmem_enabled_store() updates that if necessary.

So we're still fine right?

__shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
__shmem_get_inode() which has:

	if (sbinfo->huge)
		mapping_set_large_folios(inode->i_mapping);

Shared for both anon shmem and tmpfs-style shmem.

So I think it's fine as-is.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27  1:42 ` [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
  2026-03-27  7:29   ` Lance Yang
  2026-03-27  9:44   ` Baolin Wang
@ 2026-03-27 12:07   ` Lorenzo Stoakes (Oracle)
  2026-03-27 14:15     ` Lorenzo Stoakes (Oracle)
  2026-03-27 14:46     ` Zi Yan
  2026-03-27 13:37   ` David Hildenbrand (Arm)
  3 siblings, 2 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 12:07 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:47PM -0400, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> huge option turned on also sets large folio order on mapping, so the check
> also applies to shmem.
>
> While at it, replace VM_BUG_ON with returning failure values.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>



> ---
>  mm/khugepaged.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d06d84219e1b..45b12ffb1550 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  	int nr_none = 0;
>  	bool is_shmem = shmem_file(file);
>
> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> +	/* "huge" shmem sets mapping folio order and passes the check below */

I think this isn't quite clear and could be improved to e.g.:

	/*
	 * Either anon shmem supports huge pages as set by shmem_enabled sysfs,
	 * or a shmem file system mounted with the "huge" option.
	 */

> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> +		return SCAN_FAIL;

As per rest of thread, this looks correct.

> +	if (start & (HPAGE_PMD_NR - 1))
> +		return SCAN_ADDRESS_RANGE;

Hmm, we're kinda making this - presumably buggy situation - into a valid input
that just fails the scan.

Maybe just make it a VM_WARN_ON_ONCE()? Or if we want to avoid propagating the
bug that'd cause it any further:

	if (start & (HPAGE_PMD_NR - 1)) {
		VM_WARN_ON_ONCE(true);
		return SCAN_ADDRESS_RANGE;
	}

Or similar.

>
>  	result = alloc_charge_folio(&new_folio, mm, cc);
>  	if (result != SCAN_SUCCEED)
> --
> 2.43.0
>

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-03-27  1:42 ` [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
  2026-03-27  9:32   ` Lance Yang
@ 2026-03-27 12:23   ` Lorenzo Stoakes (Oracle)
  2026-03-27 13:58     ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 12:23 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:48PM -0400, Zi Yan wrote:
> They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
> large folio support, so that read-only THPs created in these FSes are not
> seen by the FSes when the underlying fd becomes writable. Now read-only PMD
> THPs only appear in a FS with large folio support and the supported orders
> include PMD_ORDRE.

Typo: PMD_ORDRE -> PMD_ORDER

>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

This looks obviously-correct since this stuff wouldn't have been invoked for
large folio file systems before + they already had to handle it separately, and
this function is only tied to CONFIG_READ_ONLY_THP_FOR_FS (+ a quick grep
suggests you didn't miss anything), so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  fs/open.c               | 27 ---------------------------
>  include/linux/pagemap.h | 29 -----------------------------
>  mm/filemap.c            |  1 -
>  mm/huge_memory.c        |  1 -
>  mm/khugepaged.c         | 29 ++---------------------------
>  5 files changed, 2 insertions(+), 85 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index 91f1139591ab..cef382d9d8b8 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -970,33 +970,6 @@ static int do_dentry_open(struct file *f,
>  	if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT))
>  		return -EINVAL;
>
> -	/*
> -	 * XXX: Huge page cache doesn't support writing yet. Drop all page
> -	 * cache for this file before processing writes.
> -	 */
> -	if (f->f_mode & FMODE_WRITE) {
> -		/*
> -		 * Depends on full fence from get_write_access() to synchronize
> -		 * against collapse_file() regarding i_writecount and nr_thps
> -		 * updates. Ensures subsequent insertion of THPs into the page
> -		 * cache will fail.
> -		 */
> -		if (filemap_nr_thps(inode->i_mapping)) {
> -			struct address_space *mapping = inode->i_mapping;
> -
> -			filemap_invalidate_lock(inode->i_mapping);
> -			/*
> -			 * unmap_mapping_range just need to be called once
> -			 * here, because the private pages is not need to be
> -			 * unmapped mapping (e.g. data segment of dynamic
> -			 * shared libraries here).
> -			 */
> -			unmap_mapping_range(mapping, 0, 0, 0);
> -			truncate_inode_pages(mapping, 0);
> -			filemap_invalidate_unlock(inode->i_mapping);
> -		}
> -	}
> -
>  	return 0;
>
>  cleanup_all:
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index ec442af3f886..dad3f8846cdc 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -530,35 +530,6 @@ static inline size_t mapping_max_folio_size(const struct address_space *mapping)
>  	return PAGE_SIZE << mapping_max_folio_order(mapping);
>  }
>
> -static inline int filemap_nr_thps(const struct address_space *mapping)
> -{
> -#ifdef CONFIG_READ_ONLY_THP_FOR_FS
> -	return atomic_read(&mapping->nr_thps);
> -#else
> -	return 0;
> -#endif
> -}
> -
> -static inline void filemap_nr_thps_inc(struct address_space *mapping)
> -{
> -#ifdef CONFIG_READ_ONLY_THP_FOR_FS
> -	if (!mapping_large_folio_support(mapping))
> -		atomic_inc(&mapping->nr_thps);
> -#else
> -	WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
> -#endif
> -}
> -
> -static inline void filemap_nr_thps_dec(struct address_space *mapping)
> -{
> -#ifdef CONFIG_READ_ONLY_THP_FOR_FS
> -	if (!mapping_large_folio_support(mapping))
> -		atomic_dec(&mapping->nr_thps);
> -#else
> -	WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
> -#endif
> -}
> -
>  struct address_space *folio_mapping(const struct folio *folio);
>
>  /**
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2b933a1da9bd..4248e7cdecf3 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -189,7 +189,6 @@ static void filemap_unaccount_folio(struct address_space *mapping,
>  			lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
>  	} else if (folio_test_pmd_mappable(folio)) {
>  		lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
> -		filemap_nr_thps_dec(mapping);
>  	}
>  	if (test_bit(AS_KERNEL_FILE, &folio->mapping->flags))
>  		mod_node_page_state(folio_pgdat(folio),
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b2a6060b3c20..c7873dbdc470 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3833,7 +3833,6 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>  				} else {
>  					lruvec_stat_mod_folio(folio,
>  							NR_FILE_THPS, -nr);
> -					filemap_nr_thps_dec(mapping);
>  				}
>  			}
>  		}
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 45b12ffb1550..8004ab8de6d2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2104,20 +2104,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  		goto xa_unlocked;
>  	}
>
> -	if (!is_shmem) {
> -		filemap_nr_thps_inc(mapping);
> -		/*
> -		 * Paired with the fence in do_dentry_open() -> get_write_access()
> -		 * to ensure i_writecount is up to date and the update to nr_thps
> -		 * is visible. Ensures the page cache will be truncated if the
> -		 * file is opened writable.
> -		 */
> -		smp_mb();
> -		if (inode_is_open_for_write(mapping->host)) {
> -			result = SCAN_FAIL;
> -			filemap_nr_thps_dec(mapping);
> -		}
> -	}
> +	if (!is_shmem && inode_is_open_for_write(mapping->host))
> +		result = SCAN_FAIL;
>
>  xa_locked:
>  	xas_unlock_irq(&xas);
> @@ -2296,19 +2284,6 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  		folio_putback_lru(folio);
>  		folio_put(folio);
>  	}
> -	/*
> -	 * Undo the updates of filemap_nr_thps_inc for non-SHMEM
> -	 * file only. This undo is not needed unless failure is
> -	 * due to SCAN_COPY_MC.
> -	 */
> -	if (!is_shmem && result == SCAN_COPY_MC) {
> -		filemap_nr_thps_dec(mapping);
> -		/*
> -		 * Paired with the fence in do_dentry_open() -> get_write_access()
> -		 * to ensure the update to nr_thps is visible.
> -		 */
> -		smp_mb();
> -	}
>
>  	new_folio->mapping = NULL;
>
> --
> 2.43.0
>

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 04/10] fs: remove nr_thps from struct address_space
  2026-03-27  1:42 ` [PATCH v1 04/10] fs: remove nr_thps from struct address_space Zi Yan
@ 2026-03-27 12:29   ` Lorenzo Stoakes (Oracle)
  2026-03-27 14:00   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 12:29 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:49PM -0400, Zi Yan wrote:
> filemap_nr_thps*() are removed, the related field, address_space->nr_thps,
> is no longer needed. Remove it.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

I wonder if we shouldn't squash this into previous actually, but it's fine
either way, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  fs/inode.c         | 3 ---
>  include/linux/fs.h | 5 -----
>  2 files changed, 8 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index cc12b68e021b..16ab0a345419 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -280,9 +280,6 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
>  	mapping->flags = 0;
>  	mapping->wb_err = 0;
>  	atomic_set(&mapping->i_mmap_writable, 0);
> -#ifdef CONFIG_READ_ONLY_THP_FOR_FS
> -	atomic_set(&mapping->nr_thps, 0);
> -#endif
>  	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
>  	mapping->i_private_data = NULL;
>  	mapping->writeback_index = 0;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0bdccfa70b44..35875696fb4c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -455,7 +455,6 @@ extern const struct address_space_operations empty_aops;
>   *   memory mappings.
>   * @gfp_mask: Memory allocation flags to use for allocating pages.
>   * @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
> - * @nr_thps: Number of THPs in the pagecache (non-shmem only).
>   * @i_mmap: Tree of private and shared mappings.
>   * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
>   * @nrpages: Number of page entries, protected by the i_pages lock.
> @@ -473,10 +472,6 @@ struct address_space {
>  	struct rw_semaphore	invalidate_lock;
>  	gfp_t			gfp_mask;
>  	atomic_t		i_mmap_writable;
> -#ifdef CONFIG_READ_ONLY_THP_FOR_FS
> -	/* number of thp, only for non-shmem files */
> -	atomic_t		nr_thps;
> -#endif
>  	struct rb_root_cached	i_mmap;
>  	unsigned long		nrpages;
>  	pgoff_t			writeback_index;
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-03-27  1:42 ` [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
@ 2026-03-27 12:42   ` Lorenzo Stoakes (Oracle)
  2026-03-27 15:12     ` Zi Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 12:42 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
> Replace it with a check on the max folio order of the file's address space
> mapping, making sure PMD_ORDER is supported.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/huge_memory.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c7873dbdc470..1da1467328a3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  {
>  	struct inode *inode;
>
> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> -		return false;
> -
>  	if (!vma->vm_file)
>  		return false;
>
> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  	if (IS_ANON_FILE(inode))
>  		return false;
>
> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> +		return false;
> +

At this point I think this should be a separate function quite honestly and
share it with 2/10's use, and then you can put the comment in here re: anon
shmem etc.

Though that won't apply here of course as shmem_allowable_huge_orders() would
have been invoked :)

But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
unfortunate.

Buuut having said that is this right actually?

Because we have:

		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
			return orders;

Above it, and now you're enabling huge folio file systems to do non-page fault
THP and that's err... isn't that quite a big change?

So yeah probably no to this patch as is :) we should just drop
file_thp_enabled()?

>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>  }
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 06/10] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  2026-03-27  1:42 ` [PATCH v1 06/10] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-03-27 12:50   ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 12:50 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:51PM -0400, Zi Yan wrote:
> Without READ_ONLY_THP_FOR_FS, large file-backed folios cannot be created by
> a FS without large folio support. The check is no longer needed.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Seems legitimate, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/huge_memory.c | 22 ----------------------
>  1 file changed, 22 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1da1467328a3..30eddcbf86f1 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3732,28 +3732,6 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
>  		/* order-1 is not supported for anonymous THP. */
>  		if (new_order == 1)
>  			return -EINVAL;
> -	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
> -		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> -		    !mapping_large_folio_support(folio->mapping)) {
> -			/*
> -			 * We can always split a folio down to a single page
> -			 * (new_order == 0) uniformly.
> -			 *
> -			 * For any other scenario
> -			 *   a) uniform split targeting a large folio
> -			 *      (new_order > 0)
> -			 *   b) any non-uniform split
> -			 * we must confirm that the file system supports large
> -			 * folios.
> -			 *
> -			 * Note that we might still have THPs in such
> -			 * mappings, which is created from khugepaged when
> -			 * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
> -			 * case, the mapping does not actually support large
> -			 * folios properly.
> -			 */
> -			return -EINVAL;
> -		}
>  	}
>
>  	/*
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio()
  2026-03-27  1:42 ` [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
  2026-03-27  3:33   ` Lance Yang
@ 2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
  2026-03-27 15:35     ` Zi Yan
  1 sibling, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 13:05 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:52PM -0400, Zi Yan wrote:
> After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
> not. folio_split() can be used on a FS with large folio support without
> worrying about getting a THP on a FS without large folio support.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  include/linux/huge_mm.h | 25 ++-----------------------
>  mm/truncate.c           |  8 ++++----
>  2 files changed, 6 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 1258fa37e85b..171de8138e98 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -389,27 +389,6 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>  	return split_huge_page_to_list_to_order(page, NULL, new_order);
>  }
>
> -/**
> - * try_folio_split_to_order() - try to split a @folio at @page to @new_order
> - * using non uniform split.
> - * @folio: folio to be split
> - * @page: split to @new_order at the given page
> - * @new_order: the target split order
> - *
> - * Try to split a @folio at @page using non uniform split to @new_order, if
> - * non uniform split is not supported, fall back to uniform split. After-split
> - * folios are put back to LRU list. Use min_order_for_split() to get the lower
> - * bound of @new_order.
> - *
> - * Return: 0 - split is successful, otherwise split failed.
> - */
> -static inline int try_folio_split_to_order(struct folio *folio,
> -		struct page *page, unsigned int new_order)
> -{
> -	if (folio_check_splittable(folio, new_order, SPLIT_TYPE_NON_UNIFORM))
> -		return split_huge_page_to_order(&folio->page, new_order);
> -	return folio_split(folio, new_order, page, NULL);
> -}
>  static inline int split_huge_page(struct page *page)
>  {
>  	return split_huge_page_to_list_to_order(page, NULL, 0);
> @@ -641,8 +620,8 @@ static inline int split_folio_to_list(struct folio *folio, struct list_head *lis
>  	return -EINVAL;
>  }

Hmm there's nothing in the comment or obvious jumping out at me to explain why
this is R/O thp file-backed only?

This seems like an arbitrary helper that just figures out whether it can split
using the non-uniform approach.

I think you need to explain more in the commit message why this was R/O thp
file-backed only, maybe mention some commits that added it etc., I had a quick
glance and even that didn't indicate why.

I look at folio_check_splittable() for instance and see:

	...

	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
		    !mapping_large_folio_support(folio->mapping)) {
			...
			return -EINVAL;
		}
	}

	...

	if ((split_type == SPLIT_TYPE_NON_UNIFORM || new_order) && folio_test_swapcache(folio)) {
		return -EINVAL;
	}

	if (is_huge_zero_folio(folio))
		return -EINVAL;

	if (folio_test_writeback(folio))
		return -EBUSY;

	return 0;
}

None of which suggest that you couldn't have non-uniform splits for other
cases? This at least needs some more explanation/justification in the
commit msg.

>
> -static inline int try_folio_split_to_order(struct folio *folio,
> -		struct page *page, unsigned int new_order)
> +static inline int folio_split(struct folio *folio, unsigned int new_order,
> +		struct page *page, struct list_head *list);

Yeah as Lance pointed out that ; probably shouldn't be there :)

>  {
>  	VM_WARN_ON_ONCE_FOLIO(1, folio);
>  	return -EINVAL;
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 2931d66c16d0..6973b05ec4b8 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -177,7 +177,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
>  	return 0;
>  }
>
> -static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
> +static int folio_split_or_unmap(struct folio *folio, struct page *split_at,
>  				    unsigned long min_order)

I'm not sure the removal of 'try_' is warranted in general in this patch,
as it seems like it's not guaranteed any of these will succeed? Or am I
wrong?

>  {
>  	enum ttu_flags ttu_flags =
> @@ -186,7 +186,7 @@ static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
>  		TTU_IGNORE_MLOCK;
>  	int ret;
>
> -	ret = try_folio_split_to_order(folio, split_at, min_order);
> +	ret = folio_split(folio, min_order, split_at, NULL);
>
>  	/*
>  	 * If the split fails, unmap the folio, so it will be refaulted
> @@ -252,7 +252,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>
>  	min_order = mapping_min_folio_order(folio->mapping);
>  	split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
> -	if (!try_folio_split_or_unmap(folio, split_at, min_order)) {
> +	if (!folio_split_or_unmap(folio, split_at, min_order)) {
>  		/*
>  		 * try to split at offset + length to make sure folios within
>  		 * the range can be dropped, especially to avoid memory waste
> @@ -279,7 +279,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>  		/* make sure folio2 is large and does not change its mapping */
>  		if (folio_test_large(folio2) &&
>  		    folio2->mapping == folio->mapping)
> -			try_folio_split_or_unmap(folio2, split_at2, min_order);
> +			folio_split_or_unmap(folio2, split_at2, min_order);
>
>  		folio_unlock(folio2);
>  out:
> --
> 2.43.0
>

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 08/10] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
  2026-03-27  1:42 ` [PATCH v1 08/10] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 13:05 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:53PM -0400, Zi Yan wrote:
> READ_ONLY_THP_FOR_FS is no longer present, remove related comment.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Acked-by: David Sterba <dsterba@suse.com>

LGTM so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  fs/btrfs/defrag.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
> index 7e2db5d3a4d4..a8d49d9ca981 100644
> --- a/fs/btrfs/defrag.c
> +++ b/fs/btrfs/defrag.c
> @@ -860,9 +860,6 @@ static struct folio *defrag_prepare_one_folio(struct btrfs_inode *inode, pgoff_t
>  		return folio;
>
>  	/*
> -	 * Since we can defragment files opened read-only, we can encounter
> -	 * transparent huge pages here (see CONFIG_READ_ONLY_THP_FOR_FS).
> -	 *
>  	 * The IO for such large folios is not fully tested, thus return
>  	 * an error to reject such folios unless it's an experimental build.
>  	 *
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 09/10] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
  2026-03-27  1:42 ` [PATCH v1 09/10] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
@ 2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 13:05 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:54PM -0400, Zi Yan wrote:
> Change the requirement to a file system with large folio support and the
> supported order needs to include PMD_ORDER.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

LGTM, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  tools/testing/selftests/mm/khugepaged.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
> index 3fe7ef04ac62..bdcdd31beb1e 100644
> --- a/tools/testing/selftests/mm/khugepaged.c
> +++ b/tools/testing/selftests/mm/khugepaged.c
> @@ -1086,8 +1086,8 @@ static void usage(void)
>  	fprintf(stderr, "\t<context>\t: [all|khugepaged|madvise]\n");
>  	fprintf(stderr, "\t<mem_type>\t: [all|anon|file|shmem]\n");
>  	fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n");
> -	fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n");
> -	fprintf(stderr,	"\tCONFIG_READ_ONLY_THP_FOR_FS=y\n");
> +	fprintf(stderr, "\n\t\"file,all\" mem_type requires a file system\n");
> +	fprintf(stderr,	"\twith large folio support (order >= PMD order)\n");
>  	fprintf(stderr, "\n\tif [dir] is a (sub)directory of a tmpfs mount, tmpfs must be\n");
>  	fprintf(stderr,	"\tmounted with huge=advise option for khugepaged tests to work\n");
>  	fprintf(stderr,	"\n\tSupported Options:\n");
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 10/10] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-03-27  1:42 ` [PATCH v1 10/10] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
@ 2026-03-27 13:06   ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 13:06 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Thu, Mar 26, 2026 at 09:42:55PM -0400, Zi Yan wrote:
> Any file system with large folio support and the supported orders include
> PMD_ORDER can be used.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Thanks :) Wondered if you'd fix these up :) So:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Cheers, Lorenzo

> ---
>  tools/testing/selftests/mm/guard-regions.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
> index 48e8b1539be3..13e77e48b6ef 100644
> --- a/tools/testing/selftests/mm/guard-regions.c
> +++ b/tools/testing/selftests/mm/guard-regions.c
> @@ -2205,7 +2205,7 @@ TEST_F(guard_regions, collapse)
>
>  	/*
>  	 * We must close and re-open local-file backed as read-only for
> -	 * CONFIG_READ_ONLY_THP_FOR_FS to work.
> +	 * MADV_COLLAPSE to work.
>  	 */
>  	if (variant->backing == LOCAL_FILE_BACKED) {
>  		ASSERT_EQ(close(self->fd), 0);
> @@ -2237,9 +2237,10 @@ TEST_F(guard_regions, collapse)
>  	/*
>  	 * Now collapse the entire region. This should fail in all cases.
>  	 *
> -	 * The madvise() call will also fail if CONFIG_READ_ONLY_THP_FOR_FS is
> -	 * not set for the local file case, but we can't differentiate whether
> -	 * this occurred or if the collapse was rightly rejected.
> +	 * The madvise() call will also fail if the file system does not support
> +	 * large folio or the supported orders do not include PMD_ORDER for the
> +	 * local file case, but we can't differentiate whether this occurred or
> +	 * if the collapse was rightly rejected.
>  	 */
>  	EXPECT_NE(madvise(ptr, size, MADV_COLLAPSE), 0);
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  2026-03-27  1:42 ` [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
  2026-03-27 11:45   ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 13:33   ` David Hildenbrand (Arm)
  2026-03-27 14:39     ` Zi Yan
  1 sibling, 1 reply; 55+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-27 13:33 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 3/27/26 02:42, Zi Yan wrote:
> No one will be able to use it, so the related code can be removed in the
> coming commits.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/Kconfig | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index bd283958d675..408fc7b82233 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -937,17 +937,6 @@ config THP_SWAP
>  
>  	  For selection by architectures with reasonable THP sizes.
>  
> -config READ_ONLY_THP_FOR_FS
> -	bool "Read-only THP for filesystems (EXPERIMENTAL)"
> -	depends on TRANSPARENT_HUGEPAGE
> -
> -	help
> -	  Allow khugepaged to put read-only file-backed pages in THP.
> -
> -	  This is marked experimental because it is a new feature. Write
> -	  support of file THPs will be developed in the next few release
> -	  cycles.
> -
>  config NO_PAGE_MAPCOUNT
>  	bool "No per-page mapcount (EXPERIMENTAL)"
>  	help

Isn't that usually what we do at the very end when we converted all the
code?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27  1:42 ` [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
                     ` (2 preceding siblings ...)
  2026-03-27 12:07   ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 13:37   ` David Hildenbrand (Arm)
  2026-03-27 14:43     ` Zi Yan
  3 siblings, 1 reply; 55+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-27 13:37 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 3/27/26 02:42, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> huge option turned on also sets large folio order on mapping, so the check
> also applies to shmem.
> 
> While at it, replace VM_BUG_ON with returning failure values.

Why not VM_WARN_ON_ONCE() ?

These are conditions that must be checked earlier, no?


-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 12:02     ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 13:45       ` Baolin Wang
  2026-03-27 14:12         ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 55+ messages in thread
From: Baolin Wang @ 2026-03-27 13:45 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Zi Yan, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	Andrew Morton, David Hildenbrand, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest



On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>> collapse_file() requires FSes supporting large folio with at least
>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>> huge option turned on also sets large folio order on mapping, so the check
>>> also applies to shmem.
>>>
>>> While at it, replace VM_BUG_ON with returning failure values.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>    mm/khugepaged.c | 7 +++++--
>>>    1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index d06d84219e1b..45b12ffb1550 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>    	int nr_none = 0;
>>>    	bool is_shmem = shmem_file(file);
>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>> +		return SCAN_FAIL;
>>
>> This is not true for anonymous shmem, since its large order allocation logic
>> is similar to anonymous memory. That means it will not call
>> mapping_set_large_folios() for anonymous shmem.
>>
>> So I think the check should be:
>>
>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>       return SCAN_FAIL;
> 
> Hmm but in shmem_init() we have:
> 
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
> 	else
> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
> 
> 	/*
> 	 * Default to setting PMD-sized THP to inherit the global setting and
> 	 * disable all other multi-size THPs.
> 	 */
> 	if (!shmem_orders_configured)
> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
> #endif
> 
> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
> shmem_enabled_store() updates that if necessary.
> 
> So we're still fine right?
> 
> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
> __shmem_get_inode() which has:
> 
> 	if (sbinfo->huge)
> 		mapping_set_large_folios(inode->i_mapping);
> 
> Shared for both anon shmem and tmpfs-style shmem.
> 
> So I think it's fine as-is.

I'm afraid not. Sorry, I should have been clearer.

First, anonymous shmem large order allocation is dynamically controlled 
via the global interface 
(/sys/kernel/mm/transparent_hugepage/shmem_enabled) and the mTHP 
interfaces 
(/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).

This means that during anonymous shmem initialization, these interfaces 
might be set to 'never'. so it will not call mapping_set_large_folios() 
because sbinfo->huge is 'SHMEM_HUGE_NEVER'.

Even if shmem large order allocation is subsequently enabled via the 
interfaces, __shmem_file_setup -> mapping_set_large_folios() is not 
called again.

Anonymous shmem behaves similarly to anonymous pages: it is controlled 
by the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() 
to check for allowed large orders, rather than relying on 
mapping_max_folio_order().

The mapping_max_folio_order() is intended to control large page 
allocation only for tmpfs mounts. Therefore, I find the current code 
confusing and think it needs to be fixed:

/* Don't consider 'deny' for emergencies and 'force' for testing */
if (sb != shm_mnt->mnt_sb && sbinfo->huge)
        mapping_set_large_folios(inode->i_mapping);

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig
  2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
                   ` (9 preceding siblings ...)
  2026-03-27  1:42 ` [PATCH v1 10/10] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
@ 2026-03-27 13:46 ` David Hildenbrand (Arm)
  2026-03-27 14:26   ` Zi Yan
  2026-03-27 14:27   ` Lorenzo Stoakes (Oracle)
  10 siblings, 2 replies; 55+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-27 13:46 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 3/27/26 02:42, Zi Yan wrote:
> Hi all,
> 
> This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
> read-only THPs for FSes with large folio support (the supported orders
> need to include PMD_ORDER) by default.
> 
> The changes are:
> 1. collapse_file() from mm/khugepaged.c, instead of checking
>    CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
>    of struct address_space of the file is at least PMD_ORDER.
> 2. file_thp_enabled() also checks mapping_max_folio_order() instead.
> 3. truncate_inode_partial_folio() calls folio_split() directly instead
>    of the removed try_folio_split_to_order(), since large folios can
>    only show up on a FS with large folio support.
> 4. nr_thps is removed from struct address_space, since it is no longer
>    needed to drop all read-only THPs from a FS without large folio
>    support when the fd becomes writable. Its related filemap_nr_thps*()
>    are removed too.
> 5. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.
> 6. Updated comments in various places.
> 
> Changelog
> ===
> From RFC[1]:
> 1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
>    on by default for all FSes with large folio support and the supported
>    orders includes PMD_ORDER.
> 
> Suggestions and comments are welcome.

Hi! :)

The patch set might be better structured by

1) Teaching code paths to not only respect READ_ONLY_THP_FOR_FS but also
filesystems with large folios. At that point, READ_ONLY_THP_FOR_FS would
have no effect.

2) Removing READ_ONLY_THP_FOR_FS along with all the old cruft that is no
longer required

MADV_COLLAPSE will keep working the whole time.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-03-27 12:23   ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 13:58     ` David Hildenbrand (Arm)
  2026-03-27 14:23       ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 55+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-27 13:58 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 3/27/26 13:23, Lorenzo Stoakes (Oracle) wrote:
> On Thu, Mar 26, 2026 at 09:42:48PM -0400, Zi Yan wrote:
>> They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
>> large folio support, so that read-only THPs created in these FSes are not
>> seen by the FSes when the underlying fd becomes writable. Now read-only PMD
>> THPs only appear in a FS with large folio support and the supported orders
>> include PMD_ORDRE.
> 
> Typo: PMD_ORDRE -> PMD_ORDER
> 
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> 
> This looks obviously-correct since this stuff wouldn't have been invoked for
> large folio file systems before + they already had to handle it separately, and
> this function is only tied to CONFIG_READ_ONLY_THP_FOR_FS (+ a quick grep
> suggests you didn't miss anything), so:

There could now be a race between collapsing and the file getting opened
r/w.

Are we sure that all code can really deal with that?

IOW, "they already had to handle it separately" -- is that true?
khugepaged would have never collapse in writable files, so I wonder if
all code paths are prepared for that.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 04/10] fs: remove nr_thps from struct address_space
  2026-03-27  1:42 ` [PATCH v1 04/10] fs: remove nr_thps from struct address_space Zi Yan
  2026-03-27 12:29   ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 14:00   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 55+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-27 14:00 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 3/27/26 02:42, Zi Yan wrote:
> filemap_nr_thps*() are removed, the related field, address_space->nr_thps,
> is no longer needed. Remove it.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 13:45       ` Baolin Wang
@ 2026-03-27 14:12         ` Lorenzo Stoakes (Oracle)
  2026-03-27 14:26           ` Baolin Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 14:12 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Zi Yan, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	Andrew Morton, David Hildenbrand, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>
>
> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
> > On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
> > >
> > >
> > > On 3/27/26 9:42 AM, Zi Yan wrote:
> > > > collapse_file() requires FSes supporting large folio with at least
> > > > PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> > > > huge option turned on also sets large folio order on mapping, so the check
> > > > also applies to shmem.
> > > >
> > > > While at it, replace VM_BUG_ON with returning failure values.
> > > >
> > > > Signed-off-by: Zi Yan <ziy@nvidia.com>
> > > > ---
> > > >    mm/khugepaged.c | 7 +++++--
> > > >    1 file changed, 5 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index d06d84219e1b..45b12ffb1550 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> > > >    	int nr_none = 0;
> > > >    	bool is_shmem = shmem_file(file);
> > > > -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> > > > -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> > > > +	/* "huge" shmem sets mapping folio order and passes the check below */
> > > > +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> > > > +		return SCAN_FAIL;
> > >
> > > This is not true for anonymous shmem, since its large order allocation logic
> > > is similar to anonymous memory. That means it will not call
> > > mapping_set_large_folios() for anonymous shmem.
> > >
> > > So I think the check should be:
> > >
> > > if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
> > >       return SCAN_FAIL;
> >
> > Hmm but in shmem_init() we have:
> >
> > #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
> > 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
> > 	else
> > 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
> >
> > 	/*
> > 	 * Default to setting PMD-sized THP to inherit the global setting and
> > 	 * disable all other multi-size THPs.
> > 	 */
> > 	if (!shmem_orders_configured)
> > 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
> > #endif
> >
> > And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
> > shmem_enabled_store() updates that if necessary.
> >
> > So we're still fine right?
> >
> > __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
> > __shmem_get_inode() which has:
> >
> > 	if (sbinfo->huge)
> > 		mapping_set_large_folios(inode->i_mapping);
> >
> > Shared for both anon shmem and tmpfs-style shmem.
> >
> > So I think it's fine as-is.
>
> I'm afraid not. Sorry, I should have been clearer.
>
> First, anonymous shmem large order allocation is dynamically controlled via
> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
> the mTHP interfaces
> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>
> This means that during anonymous shmem initialization, these interfaces
> might be set to 'never'. so it will not call mapping_set_large_folios()
> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>
> Even if shmem large order allocation is subsequently enabled via the
> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
> again.

I see your point, oh this is all a bit of a mess...

It feels like entirely the wrong abstraction anyway, since at best you're
getting a global 'is enabled'.

I guess what happened before was we'd never call into this with ! r/o thp for fs
&& ! is_shmem.

But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
suggestion is correct I think actualyl.

I do hate:

	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)

As a bit of code though. It's horrible.

Let's abstract that...

It'd be nice if we could find a way to clean things up in the lead up to changes
in series like this instead of sticking with the mess, but I guess since it
mostly removes stuff that's ok for now.

>
> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
> check for allowed large orders, rather than relying on
> mapping_max_folio_order().
>
> The mapping_max_folio_order() is intended to control large page allocation
> only for tmpfs mounts. Therefore, I find the current code confusing and
> think it needs to be fixed:
>
> /* Don't consider 'deny' for emergencies and 'force' for testing */
> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>        mapping_set_large_folios(inode->i_mapping);

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 12:07   ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 14:15     ` Lorenzo Stoakes (Oracle)
  2026-03-27 14:46     ` Zi Yan
  1 sibling, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 14:15 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 12:07:22PM +0000, Lorenzo Stoakes (Oracle) wrote:
> > +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> > +		return SCAN_FAIL;
>
> As per rest of thread, this looks correct.

Actually, no :)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-03-27 13:58     ` David Hildenbrand (Arm)
@ 2026-03-27 14:23       ` Lorenzo Stoakes (Oracle)
  2026-03-27 15:05         ` Zi Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 14:23 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Zi Yan, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 02:58:12PM +0100, David Hildenbrand (Arm) wrote:
> On 3/27/26 13:23, Lorenzo Stoakes (Oracle) wrote:
> > On Thu, Mar 26, 2026 at 09:42:48PM -0400, Zi Yan wrote:
> >> They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
> >> large folio support, so that read-only THPs created in these FSes are not
> >> seen by the FSes when the underlying fd becomes writable. Now read-only PMD
> >> THPs only appear in a FS with large folio support and the supported orders
> >> include PMD_ORDRE.
> >
> > Typo: PMD_ORDRE -> PMD_ORDER
> >
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >
> > This looks obviously-correct since this stuff wouldn't have been invoked for
> > large folio file systems before + they already had to handle it separately, and
> > this function is only tied to CONFIG_READ_ONLY_THP_FOR_FS (+ a quick grep
> > suggests you didn't miss anything), so:
>
> There could now be a race between collapsing and the file getting opened
> r/w.
>
> Are we sure that all code can really deal with that?
>
> IOW, "they already had to handle it separately" -- is that true?
> khugepaged would have never collapse in writable files, so I wonder if
> all code paths are prepared for that.

OK I guess I overlooked a part of this code... :) see below.

This is fine and would be a no-op anyway

-       if (f->f_mode & FMODE_WRITE) {
-               /*
-                * Depends on full fence from get_write_access() to synchronize
-                * against collapse_file() regarding i_writecount and nr_thps
-                * updates. Ensures subsequent insertion of THPs into the page
-                * cache will fail.
-                */
-               if (filemap_nr_thps(inode->i_mapping)) {

But this:

-       if (!is_shmem) {
-               filemap_nr_thps_inc(mapping);
-               /*
-                * Paired with the fence in do_dentry_open() -> get_write_access()
-                * to ensure i_writecount is up to date and the update to nr_thps
-                * is visible. Ensures the page cache will be truncated if the
-                * file is opened writable.
-                */
-               smp_mb();

We can drop barrier

-               if (inode_is_open_for_write(mapping->host)) {
-                       result = SCAN_FAIL;

But this is a functional change!

Yup missed this.

-                       filemap_nr_thps_dec(mapping);
-               }
-       }

For below:

-       /*
-        * Undo the updates of filemap_nr_thps_inc for non-SHMEM
-        * file only. This undo is not needed unless failure is
-        * due to SCAN_COPY_MC.
-        */
-       if (!is_shmem && result == SCAN_COPY_MC) {
-               filemap_nr_thps_dec(mapping);
-               /*
-                * Paired with the fence in do_dentry_open() -> get_write_access()
-                * to ensure the update to nr_thps is visible.
-                */
-               smp_mb();
-       }

Here is probably fine to remove if barrier _only_ for nr_thps.

>
> --
> Cheers,
>
> David

Sorry Zi, R-b tag withdrawn... :( I missed that 1 functional change there.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig
  2026-03-27 13:46 ` [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig David Hildenbrand (Arm)
@ 2026-03-27 14:26   ` Zi Yan
  2026-03-27 14:27   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27 14:26 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 9:46, David Hildenbrand (Arm) wrote:

> On 3/27/26 02:42, Zi Yan wrote:
>> Hi all,
>>
>> This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
>> read-only THPs for FSes with large folio support (the supported orders
>> need to include PMD_ORDER) by default.
>>
>> The changes are:
>> 1. collapse_file() from mm/khugepaged.c, instead of checking
>>    CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
>>    of struct address_space of the file is at least PMD_ORDER.
>> 2. file_thp_enabled() also checks mapping_max_folio_order() instead.
>> 3. truncate_inode_partial_folio() calls folio_split() directly instead
>>    of the removed try_folio_split_to_order(), since large folios can
>>    only show up on a FS with large folio support.
>> 4. nr_thps is removed from struct address_space, since it is no longer
>>    needed to drop all read-only THPs from a FS without large folio
>>    support when the fd becomes writable. Its related filemap_nr_thps*()
>>    are removed too.
>> 5. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.
>> 6. Updated comments in various places.
>>
>> Changelog
>> ===
>> From RFC[1]:
>> 1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
>>    on by default for all FSes with large folio support and the supported
>>    orders includes PMD_ORDER.
>>
>> Suggestions and comments are welcome.
>
> Hi! :)
>
> The patch set might be better structured by
>
> 1) Teaching code paths to not only respect READ_ONLY_THP_FOR_FS but also
> filesystems with large folios. At that point, READ_ONLY_THP_FOR_FS would
> have no effect.
>
> 2) Removing READ_ONLY_THP_FOR_FS along with all the old cruft that is no
> longer required
>
> MADV_COLLAPSE will keep working the whole time.

OK. I will give this a try.

Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 14:12         ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 14:26           ` Baolin Wang
  2026-03-27 14:31             ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 55+ messages in thread
From: Baolin Wang @ 2026-03-27 14:26 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Zi Yan, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	Andrew Morton, David Hildenbrand, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest



On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>
>>>>
>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>> also applies to shmem.
>>>>>
>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>>     mm/khugepaged.c | 7 +++++--
>>>>>     1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>> --- a/mm/khugepaged.c
>>>>> +++ b/mm/khugepaged.c
>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>     	int nr_none = 0;
>>>>>     	bool is_shmem = shmem_file(file);
>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>> +		return SCAN_FAIL;
>>>>
>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>> is similar to anonymous memory. That means it will not call
>>>> mapping_set_large_folios() for anonymous shmem.
>>>>
>>>> So I think the check should be:
>>>>
>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>        return SCAN_FAIL;
>>>
>>> Hmm but in shmem_init() we have:
>>>
>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>> 	else
>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>
>>> 	/*
>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>> 	 * disable all other multi-size THPs.
>>> 	 */
>>> 	if (!shmem_orders_configured)
>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>> #endif
>>>
>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>> shmem_enabled_store() updates that if necessary.
>>>
>>> So we're still fine right?
>>>
>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>> __shmem_get_inode() which has:
>>>
>>> 	if (sbinfo->huge)
>>> 		mapping_set_large_folios(inode->i_mapping);
>>>
>>> Shared for both anon shmem and tmpfs-style shmem.
>>>
>>> So I think it's fine as-is.
>>
>> I'm afraid not. Sorry, I should have been clearer.
>>
>> First, anonymous shmem large order allocation is dynamically controlled via
>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>> the mTHP interfaces
>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>
>> This means that during anonymous shmem initialization, these interfaces
>> might be set to 'never'. so it will not call mapping_set_large_folios()
>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>
>> Even if shmem large order allocation is subsequently enabled via the
>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>> again.
> 
> I see your point, oh this is all a bit of a mess...
> 
> It feels like entirely the wrong abstraction anyway, since at best you're
> getting a global 'is enabled'.
> 
> I guess what happened before was we'd never call into this with ! r/o thp for fs
> && ! is_shmem.

Right.

> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
> suggestion is correct I think actualyl.
> 
> I do hate:
> 
> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
> 
> As a bit of code though. It's horrible.

Indeed.

> Let's abstract that...
> 
> It'd be nice if we could find a way to clean things up in the lead up to changes
> in series like this instead of sticking with the mess, but I guess since it
> mostly removes stuff that's ok for now.

I think this check can be removed from this patch.

During the khugepaged's scan, it will call thp_vma_allowable_order() to 
check if the VMA is allowed to collapse into a PMD.

Specifically, within the call chain thp_vma_allowable_order() -> 
__thp_vma_allowable_orders(), shmem is checked via 
shmem_allowable_huge_orders(), while other FSes are checked via 
file_thp_enabled().

For those other filesystems, Patch 5 has already added the following 
check, which I think is sufficient to filter out those FSes that do not 
support large folios:

if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
	return false;


>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>> check for allowed large orders, rather than relying on
>> mapping_max_folio_order().
>>
>> The mapping_max_folio_order() is intended to control large page allocation
>> only for tmpfs mounts. Therefore, I find the current code confusing and
>> think it needs to be fixed:
>>
>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>         mapping_set_large_folios(inode->i_mapping);
> 
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig
  2026-03-27 13:46 ` [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig David Hildenbrand (Arm)
  2026-03-27 14:26   ` Zi Yan
@ 2026-03-27 14:27   ` Lorenzo Stoakes (Oracle)
  2026-03-27 14:30     ` Zi Yan
  1 sibling, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 14:27 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Zi Yan, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 02:46:43PM +0100, David Hildenbrand (Arm) wrote:
> On 3/27/26 02:42, Zi Yan wrote:
> > Hi all,
> >
> > This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
> > read-only THPs for FSes with large folio support (the supported orders
> > need to include PMD_ORDER) by default.
> >
> > The changes are:
> > 1. collapse_file() from mm/khugepaged.c, instead of checking
> >    CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
> >    of struct address_space of the file is at least PMD_ORDER.
> > 2. file_thp_enabled() also checks mapping_max_folio_order() instead.
> > 3. truncate_inode_partial_folio() calls folio_split() directly instead
> >    of the removed try_folio_split_to_order(), since large folios can
> >    only show up on a FS with large folio support.
> > 4. nr_thps is removed from struct address_space, since it is no longer
> >    needed to drop all read-only THPs from a FS without large folio
> >    support when the fd becomes writable. Its related filemap_nr_thps*()
> >    are removed too.
> > 5. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.
> > 6. Updated comments in various places.
> >
> > Changelog
> > ===
> > From RFC[1]:
> > 1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
> >    on by default for all FSes with large folio support and the supported
> >    orders includes PMD_ORDER.
> >
> > Suggestions and comments are welcome.
>
> Hi! :)
>
> The patch set might be better structured by
>
> 1) Teaching code paths to not only respect READ_ONLY_THP_FOR_FS but also
> filesystems with large folios. At that point, READ_ONLY_THP_FOR_FS would
> have no effect.

And also please do some cleaning up of the mess we have in the code base if at
all possible :) I feel like we're constantly building on sand with this, and
should treat every major change as a chance to do this.

Or otherwise we constantly keep leaving this mess around to deal with...

>
> 2) Removing READ_ONLY_THP_FOR_FS along with all the old cruft that is no
> longer required
>
> MADV_COLLAPSE will keep working the whole time.

Obviously everything should keep working throughout any version of this series.

>
> --
> Cheers,
>
> David

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig
  2026-03-27 14:27   ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 14:30     ` Zi Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27 14:30 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: David Hildenbrand (Arm), Matthew Wilcox (Oracle), Song Liu,
	Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 10:27, Lorenzo Stoakes (Oracle) wrote:

> On Fri, Mar 27, 2026 at 02:46:43PM +0100, David Hildenbrand (Arm) wrote:
>> On 3/27/26 02:42, Zi Yan wrote:
>>> Hi all,
>>>
>>> This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
>>> read-only THPs for FSes with large folio support (the supported orders
>>> need to include PMD_ORDER) by default.
>>>
>>> The changes are:
>>> 1. collapse_file() from mm/khugepaged.c, instead of checking
>>>    CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
>>>    of struct address_space of the file is at least PMD_ORDER.
>>> 2. file_thp_enabled() also checks mapping_max_folio_order() instead.
>>> 3. truncate_inode_partial_folio() calls folio_split() directly instead
>>>    of the removed try_folio_split_to_order(), since large folios can
>>>    only show up on a FS with large folio support.
>>> 4. nr_thps is removed from struct address_space, since it is no longer
>>>    needed to drop all read-only THPs from a FS without large folio
>>>    support when the fd becomes writable. Its related filemap_nr_thps*()
>>>    are removed too.
>>> 5. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.
>>> 6. Updated comments in various places.
>>>
>>> Changelog
>>> ===
>>> From RFC[1]:
>>> 1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
>>>    on by default for all FSes with large folio support and the supported
>>>    orders includes PMD_ORDER.
>>>
>>> Suggestions and comments are welcome.
>>
>> Hi! :)
>>
>> The patch set might be better structured by
>>
>> 1) Teaching code paths to not only respect READ_ONLY_THP_FOR_FS but also
>> filesystems with large folios. At that point, READ_ONLY_THP_FOR_FS would
>> have no effect.
>
> And also please do some cleaning up of the mess we have in the code base if at
> all possible :) I feel like we're constantly building on sand with this, and
> should treat every major change as a chance to do this.
>
> Or otherwise we constantly keep leaving this mess around to deal with...

Got it. Let me read through feedbacks from individual patches and come up with
a plan.

>
>>
>> 2) Removing READ_ONLY_THP_FOR_FS along with all the old cruft that is no
>> longer required
>>
>> MADV_COLLAPSE will keep working the whole time.
>
> Obviously everything should keep working throughout any version of this series.
>
Ack.


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 14:26           ` Baolin Wang
@ 2026-03-27 14:31             ` Lorenzo Stoakes (Oracle)
  2026-03-27 15:00               ` Zi Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 14:31 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Zi Yan, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
	David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
	Andrew Morton, David Hildenbrand, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>
>
> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
> > On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
> > >
> > >
> > > On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
> > > > On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
> > > > >
> > > > >
> > > > > On 3/27/26 9:42 AM, Zi Yan wrote:
> > > > > > collapse_file() requires FSes supporting large folio with at least
> > > > > > PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> > > > > > huge option turned on also sets large folio order on mapping, so the check
> > > > > > also applies to shmem.
> > > > > >
> > > > > > While at it, replace VM_BUG_ON with returning failure values.
> > > > > >
> > > > > > Signed-off-by: Zi Yan <ziy@nvidia.com>
> > > > > > ---
> > > > > >     mm/khugepaged.c | 7 +++++--
> > > > > >     1 file changed, 5 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > > index d06d84219e1b..45b12ffb1550 100644
> > > > > > --- a/mm/khugepaged.c
> > > > > > +++ b/mm/khugepaged.c
> > > > > > @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> > > > > >     	int nr_none = 0;
> > > > > >     	bool is_shmem = shmem_file(file);
> > > > > > -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> > > > > > -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> > > > > > +	/* "huge" shmem sets mapping folio order and passes the check below */
> > > > > > +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> > > > > > +		return SCAN_FAIL;
> > > > >
> > > > > This is not true for anonymous shmem, since its large order allocation logic
> > > > > is similar to anonymous memory. That means it will not call
> > > > > mapping_set_large_folios() for anonymous shmem.
> > > > >
> > > > > So I think the check should be:
> > > > >
> > > > > if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
> > > > >        return SCAN_FAIL;
> > > >
> > > > Hmm but in shmem_init() we have:
> > > >
> > > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
> > > > 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
> > > > 	else
> > > > 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
> > > >
> > > > 	/*
> > > > 	 * Default to setting PMD-sized THP to inherit the global setting and
> > > > 	 * disable all other multi-size THPs.
> > > > 	 */
> > > > 	if (!shmem_orders_configured)
> > > > 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
> > > > #endif
> > > >
> > > > And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
> > > > shmem_enabled_store() updates that if necessary.
> > > >
> > > > So we're still fine right?
> > > >
> > > > __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
> > > > __shmem_get_inode() which has:
> > > >
> > > > 	if (sbinfo->huge)
> > > > 		mapping_set_large_folios(inode->i_mapping);
> > > >
> > > > Shared for both anon shmem and tmpfs-style shmem.
> > > >
> > > > So I think it's fine as-is.
> > >
> > > I'm afraid not. Sorry, I should have been clearer.
> > >
> > > First, anonymous shmem large order allocation is dynamically controlled via
> > > the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
> > > the mTHP interfaces
> > > (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
> > >
> > > This means that during anonymous shmem initialization, these interfaces
> > > might be set to 'never'. so it will not call mapping_set_large_folios()
> > > because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
> > >
> > > Even if shmem large order allocation is subsequently enabled via the
> > > interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
> > > again.
> >
> > I see your point, oh this is all a bit of a mess...
> >
> > It feels like entirely the wrong abstraction anyway, since at best you're
> > getting a global 'is enabled'.
> >
> > I guess what happened before was we'd never call into this with ! r/o thp for fs
> > && ! is_shmem.
>
> Right.
>
> > But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
> > suggestion is correct I think actualyl.
> >
> > I do hate:
> >
> > 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
> >
> > As a bit of code though. It's horrible.
>
> Indeed.
>
> > Let's abstract that...
> >
> > It'd be nice if we could find a way to clean things up in the lead up to changes
> > in series like this instead of sticking with the mess, but I guess since it
> > mostly removes stuff that's ok for now.
>
> I think this check can be removed from this patch.
>
> During the khugepaged's scan, it will call thp_vma_allowable_order() to
> check if the VMA is allowed to collapse into a PMD.
>
> Specifically, within the call chain thp_vma_allowable_order() ->
> __thp_vma_allowable_orders(), shmem is checked via
> shmem_allowable_huge_orders(), while other FSes are checked via
> file_thp_enabled().

It sucks not to have an assert. Maybe in that case make it a
VM_WARN_ON_ONCE().

I hate that you're left tracing things back like that...

>
> For those other filesystems, Patch 5 has already added the following check,
> which I think is sufficient to filter out those FSes that do not support
> large folios:
>
> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> 	return false;

2 < 5, we won't tolerate bisection hazards.

>
>
> > > Anonymous shmem behaves similarly to anonymous pages: it is controlled by
> > > the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
> > > check for allowed large orders, rather than relying on
> > > mapping_max_folio_order().
> > >
> > > The mapping_max_folio_order() is intended to control large page allocation
> > > only for tmpfs mounts. Therefore, I find the current code confusing and
> > > think it needs to be fixed:
> > >
> > > /* Don't consider 'deny' for emergencies and 'force' for testing */
> > > if (sb != shm_mnt->mnt_sb && sbinfo->huge)
> > >         mapping_set_large_folios(inode->i_mapping);
> >
> > Cheers, Lorenzo
>

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  2026-03-27 13:33   ` David Hildenbrand (Arm)
@ 2026-03-27 14:39     ` Zi Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27 14:39 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 9:33, David Hildenbrand (Arm) wrote:

> On 3/27/26 02:42, Zi Yan wrote:
>> No one will be able to use it, so the related code can be removed in the
>> coming commits.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/Kconfig | 11 -----------
>>  1 file changed, 11 deletions(-)
>>
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index bd283958d675..408fc7b82233 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -937,17 +937,6 @@ config THP_SWAP
>>
>>  	  For selection by architectures with reasonable THP sizes.
>>
>> -config READ_ONLY_THP_FOR_FS
>> -	bool "Read-only THP for filesystems (EXPERIMENTAL)"
>> -	depends on TRANSPARENT_HUGEPAGE
>> -
>> -	help
>> -	  Allow khugepaged to put read-only file-backed pages in THP.
>> -
>> -	  This is marked experimental because it is a new feature. Write
>> -	  support of file THPs will be developed in the next few release
>> -	  cycles.
>> -
>>  config NO_PAGE_MAPCOUNT
>>  	bool "No per-page mapcount (EXPERIMENTAL)"
>>  	help
>
> Isn't that usually what we do at the very end when we converted all the
> code?

The rationale is that after removing Kconfig, the related code is always
disabled and the following patches can remove it piece by piece. The approach
you are hinting at might be to 1) remove all users of READ_ONLY_THP_FOR_FS,
making collapse_file() reject FSes without large folio support, 2) remove
other READ_ONLY_THP_FOR_FS related code. It might still cause confusion
since READ_ONLY_THP_FOR_FS is still present while its functionality is gone.

But as you pointed out in the cover letter that MADV_COLLAPSE needs to
work throughout the patchset, I will move this patch in a later stage
when MADV_COLLAPSE works on FSes with large folio support.

WDYT?

Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 13:37   ` David Hildenbrand (Arm)
@ 2026-03-27 14:43     ` Zi Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27 14:43 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 9:37, David Hildenbrand (Arm) wrote:

> On 3/27/26 02:42, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>> huge option turned on also sets large folio order on mapping, so the check
>> also applies to shmem.
>>
>> While at it, replace VM_BUG_ON with returning failure values.
>
> Why not VM_WARN_ON_ONCE() ?
>
> These are conditions that must be checked earlier, no?

start & (HPAGE_PMD_NR - 1) yes. I can convert it to VM_WARN_ON_ONCE().

For mapping_max_folio_order(mapping) < PMD_ORDER, I probably should
move it to collapse_scan_file() to prevent wasting scanning time
if the file does not support large folio. Then, I can turn it
into a VM_WARN_ON_ONCE().

Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 12:07   ` Lorenzo Stoakes (Oracle)
  2026-03-27 14:15     ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 14:46     ` Zi Yan
  1 sibling, 0 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27 14:46 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 8:07, Lorenzo Stoakes (Oracle) wrote:

> On Thu, Mar 26, 2026 at 09:42:47PM -0400, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>> huge option turned on also sets large folio order on mapping, so the check
>> also applies to shmem.
>>
>> While at it, replace VM_BUG_ON with returning failure values.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
>
>
>> ---
>>  mm/khugepaged.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d06d84219e1b..45b12ffb1550 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>  	int nr_none = 0;
>>  	bool is_shmem = shmem_file(file);
>>
>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>
> I think this isn't quite clear and could be improved to e.g.:
>
> 	/*
> 	 * Either anon shmem supports huge pages as set by shmem_enabled sysfs,
> 	 * or a shmem file system mounted with the "huge" option.
> 	 */
>
>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>> +		return SCAN_FAIL;
>
> As per rest of thread, this looks correct.

Will respond to that thread.

>
>> +	if (start & (HPAGE_PMD_NR - 1))
>> +		return SCAN_ADDRESS_RANGE;
>
> Hmm, we're kinda making this - presumably buggy situation - into a valid input
> that just fails the scan.
>
> Maybe just make it a VM_WARN_ON_ONCE()? Or if we want to avoid propagating the
> bug that'd cause it any further:
>
> 	if (start & (HPAGE_PMD_NR - 1)) {
> 		VM_WARN_ON_ONCE(true);
> 		return SCAN_ADDRESS_RANGE;
> 	}
>
> Or similar.

As I responded to David, will change it to VM_WARN_ON_ONCE().

>
>>
>>  	result = alloc_charge_folio(&new_folio, mm, cc);
>>  	if (result != SCAN_SUCCEED)
>> --
>> 2.43.0
>>
>
> Cheers, Lorenzo


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 14:31             ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 15:00               ` Zi Yan
  2026-03-27 16:22                 ` Lance Yang
  0 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27 15:00 UTC (permalink / raw)
  To: Baolin Wang, Lorenzo Stoakes (Oracle)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 27 Mar 2026, at 10:31, Lorenzo Stoakes (Oracle) wrote:

> On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
>>> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>>>
>>>>
>>>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>>>> also applies to shmem.
>>>>>>>
>>>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>>>
>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>> ---
>>>>>>>     mm/khugepaged.c | 7 +++++--
>>>>>>>     1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>>>> --- a/mm/khugepaged.c
>>>>>>> +++ b/mm/khugepaged.c
>>>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>>>     	int nr_none = 0;
>>>>>>>     	bool is_shmem = shmem_file(file);
>>>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>> +		return SCAN_FAIL;
>>>>>>
>>>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>>>> is similar to anonymous memory. That means it will not call
>>>>>> mapping_set_large_folios() for anonymous shmem.
>>>>>>
>>>>>> So I think the check should be:
>>>>>>
>>>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>        return SCAN_FAIL;
>>>>>
>>>>> Hmm but in shmem_init() we have:
>>>>>
>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>>>> 	else
>>>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>>>
>>>>> 	/*
>>>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>>>> 	 * disable all other multi-size THPs.
>>>>> 	 */
>>>>> 	if (!shmem_orders_configured)
>>>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>>>> #endif
>>>>>
>>>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>>>> shmem_enabled_store() updates that if necessary.
>>>>>
>>>>> So we're still fine right?
>>>>>
>>>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>>>> __shmem_get_inode() which has:
>>>>>
>>>>> 	if (sbinfo->huge)
>>>>> 		mapping_set_large_folios(inode->i_mapping);
>>>>>
>>>>> Shared for both anon shmem and tmpfs-style shmem.
>>>>>
>>>>> So I think it's fine as-is.
>>>>
>>>> I'm afraid not. Sorry, I should have been clearer.
>>>>
>>>> First, anonymous shmem large order allocation is dynamically controlled via
>>>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>>>> the mTHP interfaces
>>>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>>>
>>>> This means that during anonymous shmem initialization, these interfaces
>>>> might be set to 'never'. so it will not call mapping_set_large_folios()
>>>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>>>
>>>> Even if shmem large order allocation is subsequently enabled via the
>>>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>>>> again.
>>>
>>> I see your point, oh this is all a bit of a mess...
>>>
>>> It feels like entirely the wrong abstraction anyway, since at best you're
>>> getting a global 'is enabled'.
>>>
>>> I guess what happened before was we'd never call into this with ! r/o thp for fs
>>> && ! is_shmem.
>>
>> Right.
>>
>>> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
>>> suggestion is correct I think actualyl.
>>>
>>> I do hate:
>>>
>>> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>
>>> As a bit of code though. It's horrible.
>>
>> Indeed.
>>
>>> Let's abstract that...
>>>
>>> It'd be nice if we could find a way to clean things up in the lead up to changes
>>> in series like this instead of sticking with the mess, but I guess since it
>>> mostly removes stuff that's ok for now.
>>
>> I think this check can be removed from this patch.
>>
>> During the khugepaged's scan, it will call thp_vma_allowable_order() to
>> check if the VMA is allowed to collapse into a PMD.
>>
>> Specifically, within the call chain thp_vma_allowable_order() ->
>> __thp_vma_allowable_orders(), shmem is checked via
>> shmem_allowable_huge_orders(), while other FSes are checked via
>> file_thp_enabled().

But for madvise(MADV_COLLAPSE) case, IIRC, it ignores shmem huge config
and can perform collapse anyway. This means without !is_shmem the check
will break madvise(MADV_COLLAPSE). Let me know if I get it wrong, since
I was in that TVA_FORCED_COLLAPSE email thread but does not remember
everything there.


>
> It sucks not to have an assert. Maybe in that case make it a
> VM_WARN_ON_ONCE().

Will do that as I replied to David already.

>
> I hate that you're left tracing things back like that...
>
>>
>> For those other filesystems, Patch 5 has already added the following check,
>> which I think is sufficient to filter out those FSes that do not support
>> large folios:
>>
>> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>> 	return false;
>
> 2 < 5, we won't tolerate bisection hazards.
>
>>
>>
>>>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>>>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>>>> check for allowed large orders, rather than relying on
>>>> mapping_max_folio_order().
>>>>
>>>> The mapping_max_folio_order() is intended to control large page allocation
>>>> only for tmpfs mounts. Therefore, I find the current code confusing and
>>>> think it needs to be fixed:
>>>>
>>>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>>>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>>>         mapping_set_large_folios(inode->i_mapping);
>>>

Hi Baolin,

Do you want to send a fix for this?

Also I wonder how I can distinguish between anonymous shmem code and tmpfs code.
I thought they are the same thing except that they have different user interface,
but it seems that I was wrong.


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-03-27 14:23       ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 15:05         ` Zi Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27 15:05 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: David Hildenbrand (Arm), Matthew Wilcox (Oracle), Song Liu,
	Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 10:23, Lorenzo Stoakes (Oracle) wrote:

> On Fri, Mar 27, 2026 at 02:58:12PM +0100, David Hildenbrand (Arm) wrote:
>> On 3/27/26 13:23, Lorenzo Stoakes (Oracle) wrote:
>>> On Thu, Mar 26, 2026 at 09:42:48PM -0400, Zi Yan wrote:
>>>> They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
>>>> large folio support, so that read-only THPs created in these FSes are not
>>>> seen by the FSes when the underlying fd becomes writable. Now read-only PMD
>>>> THPs only appear in a FS with large folio support and the supported orders
>>>> include PMD_ORDRE.
>>>
>>> Typo: PMD_ORDRE -> PMD_ORDER
>>>
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>
>>> This looks obviously-correct since this stuff wouldn't have been invoked for
>>> large folio file systems before + they already had to handle it separately, and
>>> this function is only tied to CONFIG_READ_ONLY_THP_FOR_FS (+ a quick grep
>>> suggests you didn't miss anything), so:
>>
>> There could now be a race between collapsing and the file getting opened
>> r/w.
>>
>> Are we sure that all code can really deal with that?
>>
>> IOW, "they already had to handle it separately" -- is that true?
>> khugepaged would have never collapse in writable files, so I wonder if
>> all code paths are prepared for that.
>
> OK I guess I overlooked a part of this code... :) see below.
>
> This is fine and would be a no-op anyway
>
> -       if (f->f_mode & FMODE_WRITE) {
> -               /*
> -                * Depends on full fence from get_write_access() to synchronize
> -                * against collapse_file() regarding i_writecount and nr_thps
> -                * updates. Ensures subsequent insertion of THPs into the page
> -                * cache will fail.
> -                */
> -               if (filemap_nr_thps(inode->i_mapping)) {
>
> But this:
>
> -       if (!is_shmem) {
> -               filemap_nr_thps_inc(mapping);
> -               /*
> -                * Paired with the fence in do_dentry_open() -> get_write_access()
> -                * to ensure i_writecount is up to date and the update to nr_thps
> -                * is visible. Ensures the page cache will be truncated if the
> -                * file is opened writable.
> -                */
> -               smp_mb();
>
> We can drop barrier
>
> -               if (inode_is_open_for_write(mapping->host)) {
> -                       result = SCAN_FAIL;
>
> But this is a functional change!
>
> Yup missed this.

But I added

+	if (!is_shmem && inode_is_open_for_write(mapping->host))
+		result = SCAN_FAIL;

That keeps the original bail out, right?

>
> -                       filemap_nr_thps_dec(mapping);
> -               }
> -       }
>
> For below:
>
> -       /*
> -        * Undo the updates of filemap_nr_thps_inc for non-SHMEM
> -        * file only. This undo is not needed unless failure is
> -        * due to SCAN_COPY_MC.
> -        */
> -       if (!is_shmem && result == SCAN_COPY_MC) {
> -               filemap_nr_thps_dec(mapping);
> -               /*
> -                * Paired with the fence in do_dentry_open() -> get_write_access()
> -                * to ensure the update to nr_thps is visible.
> -                */
> -               smp_mb();
> -       }
>
> Here is probably fine to remove if barrier _only_ for nr_thps.
>
>>
>> --
>> Cheers,
>>
>> David
>
> Sorry Zi, R-b tag withdrawn... :( I missed that 1 functional change there.
>
> Cheers, Lorenzo


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-03-27 12:42   ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 15:12     ` Zi Yan
  2026-03-27 15:29       ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27 15:12 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:

> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
>> Replace it with a check on the max folio order of the file's address space
>> mapping, making sure PMD_ORDER is supported.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/huge_memory.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c7873dbdc470..1da1467328a3 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>  {
>>  	struct inode *inode;
>>
>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
>> -		return false;
>> -
>>  	if (!vma->vm_file)
>>  		return false;
>>
>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>  	if (IS_ANON_FILE(inode))
>>  		return false;
>>
>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>> +		return false;
>> +
>
> At this point I think this should be a separate function quite honestly and
> share it with 2/10's use, and then you can put the comment in here re: anon
> shmem etc.
>
> Though that won't apply here of course as shmem_allowable_huge_orders() would
> have been invoked :)
>
> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
> unfortunate.
>
> Buuut having said that is this right actually?
>
> Because we have:
>
> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> 			return orders;
>
> Above it, and now you're enabling huge folio file systems to do non-page fault
> THP and that's err... isn't that quite a big change?

That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
This patchset changes the condition from all FSes to FSes with large folio
support.

Will add a helper, mapping_support_pmd_folio(), for
mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.

>
> So yeah probably no to this patch as is :) we should just drop
> file_thp_enabled()?



>
>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>>  }
>>
>> --
>> 2.43.0
>>


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-03-27 15:12     ` Zi Yan
@ 2026-03-27 15:29       ` Lorenzo Stoakes (Oracle)
  2026-03-27 15:43         ` Zi Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 15:29 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
>
> > On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
> >> Replace it with a check on the max folio order of the file's address space
> >> mapping, making sure PMD_ORDER is supported.
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >> ---
> >>  mm/huge_memory.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index c7873dbdc470..1da1467328a3 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>  {
> >>  	struct inode *inode;
> >>
> >> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> >> -		return false;
> >> -
> >>  	if (!vma->vm_file)
> >>  		return false;
> >>
> >> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>  	if (IS_ANON_FILE(inode))
> >>  		return false;
> >>
> >> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> >> +		return false;
> >> +
> >
> > At this point I think this should be a separate function quite honestly and
> > share it with 2/10's use, and then you can put the comment in here re: anon
> > shmem etc.
> >
> > Though that won't apply here of course as shmem_allowable_huge_orders() would
> > have been invoked :)
> >
> > But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
> > unfortunate.
> >
> > Buuut having said that is this right actually?
> >
> > Because we have:
> >
> > 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> > 			return orders;
> >
> > Above it, and now you're enabling huge folio file systems to do non-page fault
> > THP and that's err... isn't that quite a big change?
>
> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
> This patchset changes the condition from all FSes to FSes with large folio
> support.

No, READ_ONLY_THP_FOR_FS operates differently.

It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
mounted read-only.

So due to:

		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
			return orders;

		if (((!in_pf || smaps)) && file_thp_enabled(vma))
			return orders;

                      |    PF     | MADV_COLLAPSE | khugepaged |
		      |-----------|---------------|------------|
large folio fs        |     ✓     |       x       |      x     |
READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |

After this change:

                      |    PF     | MADV_COLLAPSE | khugepaged |
		      |-----------|---------------|------------|
large folio fs        |     ✓     |       ✓       |      ?     |

(I hope we're not enabling khugepaged for large folio fs - which shouldn't
be necessary anyway as we try to give them folios on page fault and they
use thp-friendly get_unused_area etc. :)

We shouldn't be doing this.

It should remain:

                      |    PF     | MADV_COLLAPSE | khugepaged |
		      |-----------|---------------|------------|
large folio fs        |     ✓     |       x       |      x     |

If we're going to remove it, we should first _just remove it_, not
simultaneously increase the scope of what all the MADV_COLLAPSE code is
doing without any confidence in any of it working properly.

And it makes the whole series misleading - you're actually _enabling_ a
feature not (only) _removing_ one.

So let's focus as David suggested on one thing at a time, incrementally.

And let's please try and sort some of this confusing mess out in the code
if at all possible...

>
> Will add a helper, mapping_support_pmd_folio(), for
> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
>
> >
> > So yeah probably no to this patch as is :) we should just drop
> > file_thp_enabled()?
>
>
>
> >
> >>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> >>  }
> >>
> >> --
> >> 2.43.0
> >>
>
>
> Best Regards,
> Yan, Zi

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio()
  2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 15:35     ` Zi Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27 15:35 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 9:05, Lorenzo Stoakes (Oracle) wrote:

> On Thu, Mar 26, 2026 at 09:42:52PM -0400, Zi Yan wrote:
>> After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
>> not. folio_split() can be used on a FS with large folio support without
>> worrying about getting a THP on a FS without large folio support.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  include/linux/huge_mm.h | 25 ++-----------------------
>>  mm/truncate.c           |  8 ++++----
>>  2 files changed, 6 insertions(+), 27 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 1258fa37e85b..171de8138e98 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -389,27 +389,6 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>>  	return split_huge_page_to_list_to_order(page, NULL, new_order);
>>  }
>>
>> -/**
>> - * try_folio_split_to_order() - try to split a @folio at @page to @new_order
>> - * using non uniform split.
>> - * @folio: folio to be split
>> - * @page: split to @new_order at the given page
>> - * @new_order: the target split order
>> - *
>> - * Try to split a @folio at @page using non uniform split to @new_order, if
>> - * non uniform split is not supported, fall back to uniform split. After-split
>> - * folios are put back to LRU list. Use min_order_for_split() to get the lower
>> - * bound of @new_order.
>> - *
>> - * Return: 0 - split is successful, otherwise split failed.
>> - */
>> -static inline int try_folio_split_to_order(struct folio *folio,
>> -		struct page *page, unsigned int new_order)
>> -{
>> -	if (folio_check_splittable(folio, new_order, SPLIT_TYPE_NON_UNIFORM))
>> -		return split_huge_page_to_order(&folio->page, new_order);
>> -	return folio_split(folio, new_order, page, NULL);
>> -}
>>  static inline int split_huge_page(struct page *page)
>>  {
>>  	return split_huge_page_to_list_to_order(page, NULL, 0);
>> @@ -641,8 +620,8 @@ static inline int split_folio_to_list(struct folio *folio, struct list_head *lis
>>  	return -EINVAL;
>>  }
>
> Hmm there's nothing in the comment or obvious jumping out at me to explain why
> this is R/O thp file-backed only?
>
> This seems like an arbitrary helper that just figures out whether it can split
> using the non-uniform approach.
>
> I think you need to explain more in the commit message why this was R/O thp
> file-backed only, maybe mention some commits that added it etc., I had a quick
> glance and even that didn't indicate why.
>
> I look at folio_check_splittable() for instance and see:
>
> 	...
>
> 	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
> 		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> 		    !mapping_large_folio_support(folio->mapping)) {
> 			...
> 			return -EINVAL;
> 		}
> 	}
>
> 	...
>
> 	if ((split_type == SPLIT_TYPE_NON_UNIFORM || new_order) && folio_test_swapcache(folio)) {
> 		return -EINVAL;
> 	}
>
> 	if (is_huge_zero_folio(folio))
> 		return -EINVAL;
>
> 	if (folio_test_writeback(folio))
> 		return -EBUSY;
>
> 	return 0;
> }
>
> None of which suggest that you couldn't have non-uniform splits for other
> cases? This at least needs some more explanation/justification in the
> commit msg.

Sure.

When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can appear
in a FS without large folio support after khugepaged or madvise(MADV_COLLAPSE)
creates it. During truncate_inode_partial_folio(), such a PMD large pagecache
folio is split and if the FS does not support large folio, it needs to be split
to order-0 ones and could not be split non uniformly to ones with various orders.
try_folio_split_to_order() was added to handle this situation by checking
folio_check_splittable(..., SPLIT_TYPE_NON_UNIFORM) to detect
if the large folio is created due to READ_ONLY_THP_FOR_FS and the FS does not
support large folio. Now READ_ONLY_THP_FOR_FS is removed, all large pagecache
folios are created with FSes supporting large folio, this function is no longer
needed and all large pagecache folios can be split non uniformly.

>
>>
>> -static inline int try_folio_split_to_order(struct folio *folio,
>> -		struct page *page, unsigned int new_order)
>> +static inline int folio_split(struct folio *folio, unsigned int new_order,
>> +		struct page *page, struct list_head *list);
>
> Yeah as Lance pointed out that ; probably shouldn't be there :)

I was trying to fix folio_split() signature mismatch locally and did a simple
copy past from above. Will fix it.

>
>>  {
>>  	VM_WARN_ON_ONCE_FOLIO(1, folio);
>>  	return -EINVAL;
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 2931d66c16d0..6973b05ec4b8 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -177,7 +177,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
>>  	return 0;
>>  }
>>
>> -static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
>> +static int folio_split_or_unmap(struct folio *folio, struct page *split_at,
>>  				    unsigned long min_order)
>
> I'm not sure the removal of 'try_' is warranted in general in this patch,
> as it seems like it's not guaranteed any of these will succeed? Or am I
> wrong?

I added explanation above.

To summarize, without READ_ONLY_THP_FOR_FS, large pagecache folios can only
appear with FSes supporting large folio, so they all can be split uniformly.
Trying to split non uniformly then perform uniform split is no longer needed.
If non uniformly split fails, uniform split will fail too, barring race
conditions like folio elevated refcount.

BTW, sashiko asked if this breaks large shmem swapcache folio split[1].
The answer is no, since large shmem swapcache folio split is not supported yet.


[1] https://sashiko.dev/#/patchset/20260327014255.2058916-1-ziy%40nvidia.com?patch=11647

>
>>  {
>>  	enum ttu_flags ttu_flags =
>> @@ -186,7 +186,7 @@ static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
>>  		TTU_IGNORE_MLOCK;
>>  	int ret;
>>
>> -	ret = try_folio_split_to_order(folio, split_at, min_order);
>> +	ret = folio_split(folio, min_order, split_at, NULL);
>>
>>  	/*
>>  	 * If the split fails, unmap the folio, so it will be refaulted
>> @@ -252,7 +252,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>>
>>  	min_order = mapping_min_folio_order(folio->mapping);
>>  	split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
>> -	if (!try_folio_split_or_unmap(folio, split_at, min_order)) {
>> +	if (!folio_split_or_unmap(folio, split_at, min_order)) {
>>  		/*
>>  		 * try to split at offset + length to make sure folios within
>>  		 * the range can be dropped, especially to avoid memory waste
>> @@ -279,7 +279,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
>>  		/* make sure folio2 is large and does not change its mapping */
>>  		if (folio_test_large(folio2) &&
>>  		    folio2->mapping == folio->mapping)
>> -			try_folio_split_or_unmap(folio2, split_at2, min_order);
>> +			folio_split_or_unmap(folio2, split_at2, min_order);
>>
>>  		folio_unlock(folio2);


sashiko asked folios containing split_at2 can be split in a parallel
thread, thus splitting folio2 with split_at2 can cause an issue[1].

This is handled in __folio_split(). It has a folio != page_folio(split_at)
check.

[1] https://sashiko.dev/#/patchset/20260327014255.2058916-1-ziy%40nvidia.com?patch=11647

Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-03-27 15:29       ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 15:43         ` Zi Yan
  2026-03-27 16:08           ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27 15:43 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 11:29, Lorenzo Stoakes (Oracle) wrote:

> On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
>> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
>>
>>> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
>>>> Replace it with a check on the max folio order of the file's address space
>>>> mapping, making sure PMD_ORDER is supported.
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>>  mm/huge_memory.c | 6 +++---
>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index c7873dbdc470..1da1467328a3 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>  {
>>>>  	struct inode *inode;
>>>>
>>>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
>>>> -		return false;
>>>> -
>>>>  	if (!vma->vm_file)
>>>>  		return false;
>>>>
>>>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>  	if (IS_ANON_FILE(inode))
>>>>  		return false;
>>>>
>>>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>>> +		return false;
>>>> +
>>>
>>> At this point I think this should be a separate function quite honestly and
>>> share it with 2/10's use, and then you can put the comment in here re: anon
>>> shmem etc.
>>>
>>> Though that won't apply here of course as shmem_allowable_huge_orders() would
>>> have been invoked :)
>>>
>>> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
>>> unfortunate.
>>>
>>> Buuut having said that is this right actually?
>>>
>>> Because we have:
>>>
>>> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
>>> 			return orders;
>>>
>>> Above it, and now you're enabling huge folio file systems to do non-page fault
>>> THP and that's err... isn't that quite a big change?
>>
>> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
>> This patchset changes the condition from all FSes to FSes with large folio
>> support.
>
> No, READ_ONLY_THP_FOR_FS operates differently.
>
> It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
> mounted read-only.
>
> So due to:
>
> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> 			return orders;
>
> 		if (((!in_pf || smaps)) && file_thp_enabled(vma))
> 			return orders;
>
>                       |    PF     | MADV_COLLAPSE | khugepaged |
> 		      |-----------|---------------|------------|
> large folio fs        |     ✓     |       x       |      x     |
> READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
>
> After this change:
>
>                       |    PF     | MADV_COLLAPSE | khugepaged |
> 		      |-----------|---------------|------------|
> large folio fs        |     ✓     |       ✓       |      ?     |
>
> (I hope we're not enabling khugepaged for large folio fs - which shouldn't
> be necessary anyway as we try to give them folios on page fault and they
> use thp-friendly get_unused_area etc. :)
>
> We shouldn't be doing this.
>
> It should remain:
>
>                       |    PF     | MADV_COLLAPSE | khugepaged |
> 		      |-----------|---------------|------------|
> large folio fs        |     ✓     |       x       |      x     |
>
> If we're going to remove it, we should first _just remove it_, not
> simultaneously increase the scope of what all the MADV_COLLAPSE code is
> doing without any confidence in any of it working properly.
>
> And it makes the whole series misleading - you're actually _enabling_ a
> feature not (only) _removing_ one.

That is what my RFC patch does, but David and willy told me to do this. :)
IIUC, with READ_ONLY_THP_FOR_FS, FSes with large folio support will
get THP via MADV_COLLAPSE or khugepaged. So removing the code like I
did in RFC would cause regressions.

I guess I need to rename the series to avoid confusion. How about?

Remove read-only THP support for FSes without large folio support.

[1] https://lore.kernel.org/all/7382046f-7c58-4a3e-ab34-b2704355b7d5@kernel.org/

>
> So let's focus as David suggested on one thing at a time, incrementally.
>
> And let's please try and sort some of this confusing mess out in the code
> if at all possible...
>
>>
>> Will add a helper, mapping_support_pmd_folio(), for
>> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
>>
>>>
>>> So yeah probably no to this patch as is :) we should just drop
>>> file_thp_enabled()?
>>
>>
>>
>>>
>>>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>>>>  }
>>>>
>>>> --
>>>> 2.43.0
>>>>
>>
>>
>> Best Regards,
>> Yan, Zi
>
> Cheers, Lorenzo


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-03-27 15:43         ` Zi Yan
@ 2026-03-27 16:08           ` Lorenzo Stoakes (Oracle)
  2026-03-27 16:12             ` Zi Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 16:08 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 11:43:57AM -0400, Zi Yan wrote:
> On 27 Mar 2026, at 11:29, Lorenzo Stoakes (Oracle) wrote:
>
> > On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
> >> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
> >>
> >>> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
> >>>> Replace it with a check on the max folio order of the file's address space
> >>>> mapping, making sure PMD_ORDER is supported.
> >>>>
> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>> ---
> >>>>  mm/huge_memory.c | 6 +++---
> >>>>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>> index c7873dbdc470..1da1467328a3 100644
> >>>> --- a/mm/huge_memory.c
> >>>> +++ b/mm/huge_memory.c
> >>>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>>>  {
> >>>>  	struct inode *inode;
> >>>>
> >>>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> >>>> -		return false;
> >>>> -
> >>>>  	if (!vma->vm_file)
> >>>>  		return false;
> >>>>
> >>>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>>>  	if (IS_ANON_FILE(inode))
> >>>>  		return false;
> >>>>
> >>>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> >>>> +		return false;
> >>>> +
> >>>
> >>> At this point I think this should be a separate function quite honestly and
> >>> share it with 2/10's use, and then you can put the comment in here re: anon
> >>> shmem etc.
> >>>
> >>> Though that won't apply here of course as shmem_allowable_huge_orders() would
> >>> have been invoked :)
> >>>
> >>> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
> >>> unfortunate.
> >>>
> >>> Buuut having said that is this right actually?
> >>>
> >>> Because we have:
> >>>
> >>> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> >>> 			return orders;
> >>>
> >>> Above it, and now you're enabling huge folio file systems to do non-page fault
> >>> THP and that's err... isn't that quite a big change?
> >>
> >> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
> >> This patchset changes the condition from all FSes to FSes with large folio
> >> support.
> >
> > No, READ_ONLY_THP_FOR_FS operates differently.
> >
> > It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
> > mounted read-only.
> >
> > So due to:
> >
> > 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> > 			return orders;
> >
> > 		if (((!in_pf || smaps)) && file_thp_enabled(vma))
> > 			return orders;
> >
> >                       |    PF     | MADV_COLLAPSE | khugepaged |
> > 		      |-----------|---------------|------------|
> > large folio fs        |     ✓     |       x       |      x     |
> > READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
> >
> > After this change:
> >
> >                       |    PF     | MADV_COLLAPSE | khugepaged |
> > 		      |-----------|---------------|------------|
> > large folio fs        |     ✓     |       ✓       |      ?     |
> >
> > (I hope we're not enabling khugepaged for large folio fs - which shouldn't
> > be necessary anyway as we try to give them folios on page fault and they
> > use thp-friendly get_unused_area etc. :)
> >
> > We shouldn't be doing this.
> >
> > It should remain:
> >
> >                       |    PF     | MADV_COLLAPSE | khugepaged |
> > 		      |-----------|---------------|------------|
> > large folio fs        |     ✓     |       x       |      x     |
> >
> > If we're going to remove it, we should first _just remove it_, not
> > simultaneously increase the scope of what all the MADV_COLLAPSE code is
> > doing without any confidence in any of it working properly.
> >
> > And it makes the whole series misleading - you're actually _enabling_ a
> > feature not (only) _removing_ one.
>
> That is what my RFC patch does, but David and willy told me to do this. :)
> IIUC, with READ_ONLY_THP_FOR_FS, FSes with large folio support will
> get THP via MADV_COLLAPSE or khugepaged. So removing the code like I
> did in RFC would cause regressions.

OK I think we're dealing with a union of the two states here.

READ_ONLY_THP_FOR_FS is separate from large folio support, as checked by
file_thp_enabled():

static inline bool file_thp_enabled(struct vm_area_struct *vma)
{
	struct inode *inode;

	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
		return false;

	if (!vma->vm_file)
		return false;

	inode = file_inode(vma->vm_file);

	if (IS_ANON_FILE(inode))
		return false;

	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}

So actually:

                       |    PF     | MADV_COLLAPSE | khugepaged |
		       |-----------|---------------|------------|
 large folio fs        |     ✓     |       x       |      x     |
 READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
 both!                 |     ✓     |       ✓       |      ✓     |

(Where it's impllied it's a read-only mapping obviously for the later two
cases.)

Now without READ_ONLY_THP_FOR_FS you're going to:

                       |    PF     | MADV_COLLAPSE | khugepaged |
		       |-----------|---------------|------------|
 large folio fs        |     ✓     |       x       |      x     |
 large folio + r/o     |     ✓     |       ✓       |      ✓     |

And intentionally leaving behind the 'not large folio fs, r/o' case because
those file systems need to implement large folio support.

I guess we'll regress those users but we don't care?

I do think all this needs to be spelled out in the commit message though as it's
subtle.

Turns out this PitA config option is going to kick and scream a bit first before
it goes...

>
> I guess I need to rename the series to avoid confusion. How about?
>
> Remove read-only THP support for FSes without large folio support.

Yup that'd be better :)

Cheers, Lorenzo

>
> [1] https://lore.kernel.org/all/7382046f-7c58-4a3e-ab34-b2704355b7d5@kernel.org/
>
> >
> > So let's focus as David suggested on one thing at a time, incrementally.
> >
> > And let's please try and sort some of this confusing mess out in the code
> > if at all possible...
> >
> >>
> >> Will add a helper, mapping_support_pmd_folio(), for
> >> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
> >>
> >>>
> >>> So yeah probably no to this patch as is :) we should just drop
> >>> file_thp_enabled()?
> >>
> >>
> >>
> >>>
> >>>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> >>>>  }
> >>>>
> >>>> --
> >>>> 2.43.0
> >>>>
> >>
> >>
> >> Best Regards,
> >> Yan, Zi
> >
> > Cheers, Lorenzo
>
>
> Best Regards,
> Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-03-27 16:08           ` Lorenzo Stoakes (Oracle)
@ 2026-03-27 16:12             ` Zi Yan
  2026-03-27 16:14               ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 55+ messages in thread
From: Zi Yan @ 2026-03-27 16:12 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 12:08, Lorenzo Stoakes (Oracle) wrote:

> On Fri, Mar 27, 2026 at 11:43:57AM -0400, Zi Yan wrote:
>> On 27 Mar 2026, at 11:29, Lorenzo Stoakes (Oracle) wrote:
>>
>>> On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
>>>> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
>>>>
>>>>> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
>>>>>> Replace it with a check on the max folio order of the file's address space
>>>>>> mapping, making sure PMD_ORDER is supported.
>>>>>>
>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>> ---
>>>>>>  mm/huge_memory.c | 6 +++---
>>>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index c7873dbdc470..1da1467328a3 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>>>  {
>>>>>>  	struct inode *inode;
>>>>>>
>>>>>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
>>>>>> -		return false;
>>>>>> -
>>>>>>  	if (!vma->vm_file)
>>>>>>  		return false;
>>>>>>
>>>>>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>>>  	if (IS_ANON_FILE(inode))
>>>>>>  		return false;
>>>>>>
>>>>>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>>>>> +		return false;
>>>>>> +
>>>>>
>>>>> At this point I think this should be a separate function quite honestly and
>>>>> share it with 2/10's use, and then you can put the comment in here re: anon
>>>>> shmem etc.
>>>>>
>>>>> Though that won't apply here of course as shmem_allowable_huge_orders() would
>>>>> have been invoked :)
>>>>>
>>>>> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
>>>>> unfortunate.
>>>>>
>>>>> Buuut having said that is this right actually?
>>>>>
>>>>> Because we have:
>>>>>
>>>>> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
>>>>> 			return orders;
>>>>>
>>>>> Above it, and now you're enabling huge folio file systems to do non-page fault
>>>>> THP and that's err... isn't that quite a big change?
>>>>
>>>> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
>>>> This patchset changes the condition from all FSes to FSes with large folio
>>>> support.
>>>
>>> No, READ_ONLY_THP_FOR_FS operates differently.
>>>
>>> It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
>>> mounted read-only.
>>>
>>> So due to:
>>>
>>> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
>>> 			return orders;
>>>
>>> 		if (((!in_pf || smaps)) && file_thp_enabled(vma))
>>> 			return orders;
>>>
>>>                       |    PF     | MADV_COLLAPSE | khugepaged |
>>> 		      |-----------|---------------|------------|
>>> large folio fs        |     ✓     |       x       |      x     |
>>> READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
>>>
>>> After this change:
>>>
>>>                       |    PF     | MADV_COLLAPSE | khugepaged |
>>> 		      |-----------|---------------|------------|
>>> large folio fs        |     ✓     |       ✓       |      ?     |
>>>
>>> (I hope we're not enabling khugepaged for large folio fs - which shouldn't
>>> be necessary anyway as we try to give them folios on page fault and they
>>> use thp-friendly get_unused_area etc. :)
>>>
>>> We shouldn't be doing this.
>>>
>>> It should remain:
>>>
>>>                       |    PF     | MADV_COLLAPSE | khugepaged |
>>> 		      |-----------|---------------|------------|
>>> large folio fs        |     ✓     |       x       |      x     |
>>>
>>> If we're going to remove it, we should first _just remove it_, not
>>> simultaneously increase the scope of what all the MADV_COLLAPSE code is
>>> doing without any confidence in any of it working properly.
>>>
>>> And it makes the whole series misleading - you're actually _enabling_ a
>>> feature not (only) _removing_ one.
>>
>> That is what my RFC patch does, but David and willy told me to do this. :)
>> IIUC, with READ_ONLY_THP_FOR_FS, FSes with large folio support will
>> get THP via MADV_COLLAPSE or khugepaged. So removing the code like I
>> did in RFC would cause regressions.
>
> OK I think we're dealing with a union of the two states here.
>
> READ_ONLY_THP_FOR_FS is separate from large folio support, as checked by
> file_thp_enabled():
>
> static inline bool file_thp_enabled(struct vm_area_struct *vma)
> {
> 	struct inode *inode;
>
> 	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> 		return false;
>
> 	if (!vma->vm_file)
> 		return false;
>
> 	inode = file_inode(vma->vm_file);
>
> 	if (IS_ANON_FILE(inode))
> 		return false;
>
> 	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
> So actually:
>
>                        |    PF     | MADV_COLLAPSE | khugepaged |
> 		       |-----------|---------------|------------|
>  large folio fs        |     ✓     |       x       |      x     |
>  READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
>  both!                 |     ✓     |       ✓       |      ✓     |
>
> (Where it's impllied it's a read-only mapping obviously for the later two
> cases.)
>
> Now without READ_ONLY_THP_FOR_FS you're going to:
>
>                        |    PF     | MADV_COLLAPSE | khugepaged |
> 		       |-----------|---------------|------------|
>  large folio fs        |     ✓     |       x       |      x     |
>  large folio + r/o     |     ✓     |       ✓       |      ✓     |
>
> And intentionally leaving behind the 'not large folio fs, r/o' case because
> those file systems need to implement large folio support.
>
> I guess we'll regress those users but we don't care?

Yes. This also motivates FSes without large folio support to add large folio
support instead of relying on READ_ONLY_THP_FOR_FS hack.

>
> I do think all this needs to be spelled out in the commit message though as it's
> subtle.
>
> Turns out this PitA config option is going to kick and scream a bit first before
> it goes...

Sure. I will shameless steal your tables. Thank you for the contribution. ;)

>
>>
>> I guess I need to rename the series to avoid confusion. How about?
>>
>> Remove read-only THP support for FSes without large folio support.
>
> Yup that'd be better :)
>
> Cheers, Lorenzo
>
>>
>> [1] https://lore.kernel.org/all/7382046f-7c58-4a3e-ab34-b2704355b7d5@kernel.org/
>>
>>>
>>> So let's focus as David suggested on one thing at a time, incrementally.
>>>
>>> And let's please try and sort some of this confusing mess out in the code
>>> if at all possible...
>>>
>>>>
>>>> Will add a helper, mapping_support_pmd_folio(), for
>>>> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
>>>>
>>>>>
>>>>> So yeah probably no to this patch as is :) we should just drop
>>>>> file_thp_enabled()?
>>>>
>>>>
>>>>
>>>>>
>>>>>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>>>>>>  }
>>>>>>
>>>>>> --
>>>>>> 2.43.0
>>>>>>
>>>>
>>>>
>>>> Best Regards,
>>>> Yan, Zi
>>>
>>> Cheers, Lorenzo
>>
>>
>> Best Regards,
>> Yan, Zi


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-03-27 16:12             ` Zi Yan
@ 2026-03-27 16:14               ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-27 16:14 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	David Hildenbrand, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Fri, Mar 27, 2026 at 12:12:04PM -0400, Zi Yan wrote:
> On 27 Mar 2026, at 12:08, Lorenzo Stoakes (Oracle) wrote:
> > So actually:
> >
> >                        |    PF     | MADV_COLLAPSE | khugepaged |
> > 		       |-----------|---------------|------------|
> >  large folio fs        |     ✓     |       x       |      x     |
> >  READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
> >  both!                 |     ✓     |       ✓       |      ✓     |
> >
> > (Where it's impllied it's a read-only mapping obviously for the later two
> > cases.)
> >
> > Now without READ_ONLY_THP_FOR_FS you're going to:
> >
> >                        |    PF     | MADV_COLLAPSE | khugepaged |
> > 		       |-----------|---------------|------------|
> >  large folio fs        |     ✓     |       x       |      x     |
> >  large folio + r/o     |     ✓     |       ✓       |      ✓     |
> >
> > And intentionally leaving behind the 'not large folio fs, r/o' case because
> > those file systems need to implement large folio support.
> >
> > I guess we'll regress those users but we don't care?
>
> Yes. This also motivates FSes without large folio support to add large folio
> support instead of relying on READ_ONLY_THP_FOR_FS hack.

Ack that's something I can back :)

>
> >
> > I do think all this needs to be spelled out in the commit message though as it's
> > subtle.
> >
> > Turns out this PitA config option is going to kick and scream a bit first before
> > it goes...
>
> Sure. I will shameless steal your tables. Thank you for the contribution. ;)
>

Haha good I love to spread ASCII art :)

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 15:00               ` Zi Yan
@ 2026-03-27 16:22                 ` Lance Yang
  2026-03-27 16:30                   ` Zi Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Lance Yang @ 2026-03-27 16:22 UTC (permalink / raw)
  To: ziy
  Cc: baolin.wang, ljs, willy, songliubraving, clm, dsterba, viro,
	brauner, jack, akpm, david, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, vbabka, rppt, surenb, mhocko, shuah,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest


On Fri, Mar 27, 2026 at 11:00:26AM -0400, Zi Yan wrote:
>On 27 Mar 2026, at 10:31, Lorenzo Stoakes (Oracle) wrote:
>
>> On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>>>
>>>
>>> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
>>>> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>>>>> also applies to shmem.
>>>>>>>>
>>>>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>>>>
>>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>> ---
>>>>>>>>     mm/khugepaged.c | 7 +++++--
>>>>>>>>     1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>>>>> --- a/mm/khugepaged.c
>>>>>>>> +++ b/mm/khugepaged.c
>>>>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>>>>     	int nr_none = 0;
>>>>>>>>     	bool is_shmem = shmem_file(file);
>>>>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>> +		return SCAN_FAIL;
>>>>>>>
>>>>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>>>>> is similar to anonymous memory. That means it will not call
>>>>>>> mapping_set_large_folios() for anonymous shmem.
>>>>>>>
>>>>>>> So I think the check should be:
>>>>>>>
>>>>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>        return SCAN_FAIL;
>>>>>>
>>>>>> Hmm but in shmem_init() we have:
>>>>>>
>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>>>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>>>>> 	else
>>>>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>>>>
>>>>>> 	/*
>>>>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>>>>> 	 * disable all other multi-size THPs.
>>>>>> 	 */
>>>>>> 	if (!shmem_orders_configured)
>>>>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>>>>> #endif
>>>>>>
>>>>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>>>>> shmem_enabled_store() updates that if necessary.
>>>>>>
>>>>>> So we're still fine right?
>>>>>>
>>>>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>>>>> __shmem_get_inode() which has:
>>>>>>
>>>>>> 	if (sbinfo->huge)
>>>>>> 		mapping_set_large_folios(inode->i_mapping);
>>>>>>
>>>>>> Shared for both anon shmem and tmpfs-style shmem.
>>>>>>
>>>>>> So I think it's fine as-is.
>>>>>
>>>>> I'm afraid not. Sorry, I should have been clearer.
>>>>>
>>>>> First, anonymous shmem large order allocation is dynamically controlled via
>>>>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>>>>> the mTHP interfaces
>>>>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>>>>
>>>>> This means that during anonymous shmem initialization, these interfaces
>>>>> might be set to 'never'. so it will not call mapping_set_large_folios()
>>>>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>>>>
>>>>> Even if shmem large order allocation is subsequently enabled via the
>>>>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>>>>> again.
>>>>
>>>> I see your point, oh this is all a bit of a mess...
>>>>
>>>> It feels like entirely the wrong abstraction anyway, since at best you're
>>>> getting a global 'is enabled'.
>>>>
>>>> I guess what happened before was we'd never call into this with ! r/o thp for fs
>>>> && ! is_shmem.
>>>
>>> Right.
>>>
>>>> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
>>>> suggestion is correct I think actualyl.
>>>>
>>>> I do hate:
>>>>
>>>> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>
>>>> As a bit of code though. It's horrible.
>>>
>>> Indeed.
>>>
>>>> Let's abstract that...
>>>>
>>>> It'd be nice if we could find a way to clean things up in the lead up to changes
>>>> in series like this instead of sticking with the mess, but I guess since it
>>>> mostly removes stuff that's ok for now.
>>>
>>> I think this check can be removed from this patch.
>>>
>>> During the khugepaged's scan, it will call thp_vma_allowable_order() to
>>> check if the VMA is allowed to collapse into a PMD.
>>>
>>> Specifically, within the call chain thp_vma_allowable_order() ->
>>> __thp_vma_allowable_orders(), shmem is checked via
>>> shmem_allowable_huge_orders(), while other FSes are checked via
>>> file_thp_enabled().
>
>But for madvise(MADV_COLLAPSE) case, IIRC, it ignores shmem huge config
>and can perform collapse anyway. This means without !is_shmem the check
>will break madvise(MADV_COLLAPSE). Let me know if I get it wrong, since

Right. That will break MADV_COLLAPSE, IIUC.

For MADV_COLLAPSE on anonymous shmem, eligibility is determined by the
TVA_FORCED_COLLAPSE path via shmem_allowable_huge_orders(), not by
whether the inode mapping got mapping_set_large_folios() at creation
time.

Using mmap(MAP_SHARED | MAP_ANONYMOUS):
- create time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=never
- collapse time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=always

With the !is_shmem guard, collapse succeeds. Without it, the same setup
fails with -EINVAL.

Thanks,
Lance

>I was in that TVA_FORCED_COLLAPSE email thread but does not remember
>everything there.
>
>
>>
>> It sucks not to have an assert. Maybe in that case make it a
>> VM_WARN_ON_ONCE().
>
>Will do that as I replied to David already.
>
>>
>> I hate that you're left tracing things back like that...
>>
>>>
>>> For those other filesystems, Patch 5 has already added the following check,
>>> which I think is sufficient to filter out those FSes that do not support
>>> large folios:
>>>
>>> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>> 	return false;
>>
>> 2 < 5, we won't tolerate bisection hazards.
>>
>>>
>>>
>>>>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>>>>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>>>>> check for allowed large orders, rather than relying on
>>>>> mapping_max_folio_order().
>>>>>
>>>>> The mapping_max_folio_order() is intended to control large page allocation
>>>>> only for tmpfs mounts. Therefore, I find the current code confusing and
>>>>> think it needs to be fixed:
>>>>>
>>>>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>>>>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>>>>         mapping_set_large_folios(inode->i_mapping);
>>>>
>
>Hi Baolin,
>
>Do you want to send a fix for this?
>
>Also I wonder how I can distinguish between anonymous shmem code and tmpfs code.
>I thought they are the same thing except that they have different user interface,
>but it seems that I was wrong.
>
>
>Best Regards,
>Yan, Zi
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-03-27 16:22                 ` Lance Yang
@ 2026-03-27 16:30                   ` Zi Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Zi Yan @ 2026-03-27 16:30 UTC (permalink / raw)
  To: Lance Yang
  Cc: baolin.wang, ljs, willy, songliubraving, clm, dsterba, viro,
	brauner, jack, akpm, david, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, vbabka, rppt, surenb, mhocko, shuah,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 27 Mar 2026, at 12:22, Lance Yang wrote:

> On Fri, Mar 27, 2026 at 11:00:26AM -0400, Zi Yan wrote:
>> On 27 Mar 2026, at 10:31, Lorenzo Stoakes (Oracle) wrote:
>>
>>> On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>>>>
>>>>
>>>> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>>>>>> also applies to shmem.
>>>>>>>>>
>>>>>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>>> ---
>>>>>>>>>     mm/khugepaged.c | 7 +++++--
>>>>>>>>>     1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>>>>>> --- a/mm/khugepaged.c
>>>>>>>>> +++ b/mm/khugepaged.c
>>>>>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>>>>>     	int nr_none = 0;
>>>>>>>>>     	bool is_shmem = shmem_file(file);
>>>>>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>>> +		return SCAN_FAIL;
>>>>>>>>
>>>>>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>>>>>> is similar to anonymous memory. That means it will not call
>>>>>>>> mapping_set_large_folios() for anonymous shmem.
>>>>>>>>
>>>>>>>> So I think the check should be:
>>>>>>>>
>>>>>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>>        return SCAN_FAIL;
>>>>>>>
>>>>>>> Hmm but in shmem_init() we have:
>>>>>>>
>>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>>>>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>>>>>> 	else
>>>>>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>>>>>
>>>>>>> 	/*
>>>>>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>>>>>> 	 * disable all other multi-size THPs.
>>>>>>> 	 */
>>>>>>> 	if (!shmem_orders_configured)
>>>>>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>>>>>> #endif
>>>>>>>
>>>>>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>>>>>> shmem_enabled_store() updates that if necessary.
>>>>>>>
>>>>>>> So we're still fine right?
>>>>>>>
>>>>>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>>>>>> __shmem_get_inode() which has:
>>>>>>>
>>>>>>> 	if (sbinfo->huge)
>>>>>>> 		mapping_set_large_folios(inode->i_mapping);
>>>>>>>
>>>>>>> Shared for both anon shmem and tmpfs-style shmem.
>>>>>>>
>>>>>>> So I think it's fine as-is.
>>>>>>
>>>>>> I'm afraid not. Sorry, I should have been clearer.
>>>>>>
>>>>>> First, anonymous shmem large order allocation is dynamically controlled via
>>>>>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>>>>>> the mTHP interfaces
>>>>>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>>>>>
>>>>>> This means that during anonymous shmem initialization, these interfaces
>>>>>> might be set to 'never'. so it will not call mapping_set_large_folios()
>>>>>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>>>>>
>>>>>> Even if shmem large order allocation is subsequently enabled via the
>>>>>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>>>>>> again.
>>>>>
>>>>> I see your point, oh this is all a bit of a mess...
>>>>>
>>>>> It feels like entirely the wrong abstraction anyway, since at best you're
>>>>> getting a global 'is enabled'.
>>>>>
>>>>> I guess what happened before was we'd never call into this with ! r/o thp for fs
>>>>> && ! is_shmem.
>>>>
>>>> Right.
>>>>
>>>>> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
>>>>> suggestion is correct I think actualyl.
>>>>>
>>>>> I do hate:
>>>>>
>>>>> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>
>>>>> As a bit of code though. It's horrible.
>>>>
>>>> Indeed.
>>>>
>>>>> Let's abstract that...
>>>>>
>>>>> It'd be nice if we could find a way to clean things up in the lead up to changes
>>>>> in series like this instead of sticking with the mess, but I guess since it
>>>>> mostly removes stuff that's ok for now.
>>>>
>>>> I think this check can be removed from this patch.
>>>>
>>>> During the khugepaged's scan, it will call thp_vma_allowable_order() to
>>>> check if the VMA is allowed to collapse into a PMD.
>>>>
>>>> Specifically, within the call chain thp_vma_allowable_order() ->
>>>> __thp_vma_allowable_orders(), shmem is checked via
>>>> shmem_allowable_huge_orders(), while other FSes are checked via
>>>> file_thp_enabled().
>>
>> But for madvise(MADV_COLLAPSE) case, IIRC, it ignores shmem huge config
>> and can perform collapse anyway. This means without !is_shmem the check
>> will break madvise(MADV_COLLAPSE). Let me know if I get it wrong, since
>
> Right. That will break MADV_COLLAPSE, IIUC.
>
> For MADV_COLLAPSE on anonymous shmem, eligibility is determined by the
> TVA_FORCED_COLLAPSE path via shmem_allowable_huge_orders(), not by
> whether the inode mapping got mapping_set_large_folios() at creation
> time.
>
> Using mmap(MAP_SHARED | MAP_ANONYMOUS):
> - create time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=never
> - collapse time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=always
>
> With the !is_shmem guard, collapse succeeds. Without it, the same setup
> fails with -EINVAL.

Thank you for the confirmation. I will fix it.

>
> Thanks,
> Lance
>
>> I was in that TVA_FORCED_COLLAPSE email thread but does not remember
>> everything there.
>>
>>
>>>
>>> It sucks not to have an assert. Maybe in that case make it a
>>> VM_WARN_ON_ONCE().
>>
>> Will do that as I replied to David already.
>>
>>>
>>> I hate that you're left tracing things back like that...
>>>
>>>>
>>>> For those other filesystems, Patch 5 has already added the following check,
>>>> which I think is sufficient to filter out those FSes that do not support
>>>> large folios:
>>>>
>>>> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>>> 	return false;
>>>
>>> 2 < 5, we won't tolerate bisection hazards.
>>>
>>>>
>>>>
>>>>>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>>>>>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>>>>>> check for allowed large orders, rather than relying on
>>>>>> mapping_max_folio_order().
>>>>>>
>>>>>> The mapping_max_folio_order() is intended to control large page allocation
>>>>>> only for tmpfs mounts. Therefore, I find the current code confusing and
>>>>>> think it needs to be fixed:
>>>>>>
>>>>>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>>>>>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>>>>>         mapping_set_large_folios(inode->i_mapping);
>>>>>
>>
>> Hi Baolin,
>>
>> Do you want to send a fix for this?
>>
>> Also I wonder how I can distinguish between anonymous shmem code and tmpfs code.
>> I thought they are the same thing except that they have different user interface,
>> but it seems that I was wrong.
>>
>>
>> Best Regards,
>> Yan, Zi
>>
>>


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2026-03-27 16:30 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27  1:42 [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig Zi Yan
2026-03-27  1:42 ` [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-03-27 11:45   ` Lorenzo Stoakes (Oracle)
2026-03-27 13:33   ` David Hildenbrand (Arm)
2026-03-27 14:39     ` Zi Yan
2026-03-27  1:42 ` [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-03-27  7:29   ` Lance Yang
2026-03-27  7:35     ` Lance Yang
2026-03-27  9:44   ` Baolin Wang
2026-03-27 12:02     ` Lorenzo Stoakes (Oracle)
2026-03-27 13:45       ` Baolin Wang
2026-03-27 14:12         ` Lorenzo Stoakes (Oracle)
2026-03-27 14:26           ` Baolin Wang
2026-03-27 14:31             ` Lorenzo Stoakes (Oracle)
2026-03-27 15:00               ` Zi Yan
2026-03-27 16:22                 ` Lance Yang
2026-03-27 16:30                   ` Zi Yan
2026-03-27 12:07   ` Lorenzo Stoakes (Oracle)
2026-03-27 14:15     ` Lorenzo Stoakes (Oracle)
2026-03-27 14:46     ` Zi Yan
2026-03-27 13:37   ` David Hildenbrand (Arm)
2026-03-27 14:43     ` Zi Yan
2026-03-27  1:42 ` [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-03-27  9:32   ` Lance Yang
2026-03-27 12:23   ` Lorenzo Stoakes (Oracle)
2026-03-27 13:58     ` David Hildenbrand (Arm)
2026-03-27 14:23       ` Lorenzo Stoakes (Oracle)
2026-03-27 15:05         ` Zi Yan
2026-03-27  1:42 ` [PATCH v1 04/10] fs: remove nr_thps from struct address_space Zi Yan
2026-03-27 12:29   ` Lorenzo Stoakes (Oracle)
2026-03-27 14:00   ` David Hildenbrand (Arm)
2026-03-27  1:42 ` [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-03-27 12:42   ` Lorenzo Stoakes (Oracle)
2026-03-27 15:12     ` Zi Yan
2026-03-27 15:29       ` Lorenzo Stoakes (Oracle)
2026-03-27 15:43         ` Zi Yan
2026-03-27 16:08           ` Lorenzo Stoakes (Oracle)
2026-03-27 16:12             ` Zi Yan
2026-03-27 16:14               ` Lorenzo Stoakes (Oracle)
2026-03-27  1:42 ` [PATCH v1 06/10] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-03-27 12:50   ` Lorenzo Stoakes (Oracle)
2026-03-27  1:42 ` [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-03-27  3:33   ` Lance Yang
2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
2026-03-27 15:35     ` Zi Yan
2026-03-27  1:42 ` [PATCH v1 08/10] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
2026-03-27  1:42 ` [PATCH v1 09/10] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-03-27 13:05   ` Lorenzo Stoakes (Oracle)
2026-03-27  1:42 ` [PATCH v1 10/10] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
2026-03-27 13:06   ` Lorenzo Stoakes (Oracle)
2026-03-27 13:46 ` [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig David Hildenbrand (Arm)
2026-03-27 14:26   ` Zi Yan
2026-03-27 14:27   ` Lorenzo Stoakes (Oracle)
2026-03-27 14:30     ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox