* [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 12:40 ` David Hildenbrand (Arm)
2026-04-25 22:01 ` Andrew Morton
2026-04-24 2:49 ` [PATCH 7.2 v4 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
` (11 subsequent siblings)
12 siblings, 2 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
collapse_file() requires FSes supporting large folio with at least
PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
Add a helper function mapping_pmd_thp_support() for FSes supporting large
folio with at least PMD_ORDER.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/pagemap.h | 9 +++++++++
mm/khugepaged.c | 10 ++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9..5b4313d91137 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -513,6 +513,15 @@ static inline bool mapping_large_folio_support(const struct address_space *mappi
return mapping_max_folio_order(mapping) > 0;
}
+static inline bool mapping_pmd_thp_support(const struct address_space *mapping)
+{
+ /* AS_FOLIO_ORDER is only reasonable for pagecache folios */
+ VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON);
+
+ return mapping_max_folio_order(mapping) >= PMD_ORDER;
+}
+
+
/* Return the maximum folio size for this pagecache mapping, in bytes. */
static inline size_t mapping_max_folio_size(const struct address_space *mapping)
{
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7d48d4fbd5f3..79f051eb6195 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2235,8 +2235,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
int nr_none = 0;
bool is_shmem = shmem_file(file);
- VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
- VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
+ /*
+ * MADV_COLLAPSE ignores shmem huge config, so do not check shmem
+ *
+ * TODO: once shmem always calls mapping_set_large_folios() on its
+ * mapping, the shmem check can be removed.
+ */
+ VM_WARN_ON_ONCE(!is_shmem && !mapping_pmd_thp_support(mapping));
+ VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
result = alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER);
if (result != SCAN_SUCCEED)
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
2026-04-24 2:49 ` [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
@ 2026-04-24 12:40 ` David Hildenbrand (Arm)
2026-04-24 14:49 ` Zi Yan
2026-04-25 22:01 ` Andrew Morton
1 sibling, 1 reply; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 12:40 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 04:49, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>
> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>
> Add a helper function mapping_pmd_thp_support() for FSes supporting large
> folio with at least PMD_ORDER.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/pagemap.h | 9 +++++++++
> mm/khugepaged.c | 10 ++++++++--
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 31a848485ad9..5b4313d91137 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -513,6 +513,15 @@ static inline bool mapping_large_folio_support(const struct address_space *mappi
> return mapping_max_folio_order(mapping) > 0;
> }
>
> +static inline bool mapping_pmd_thp_support(const struct address_space *mapping)
> +{
> + /* AS_FOLIO_ORDER is only reasonable for pagecache folios */
> + VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON);
> +
> + return mapping_max_folio_order(mapping) >= PMD_ORDER;
> +}
Given mapping_large_folio_support(), I wonder whether we should call that
mapping_pmd_folio_support() ?
> +
> +
> /* Return the maximum folio size for this pagecache mapping, in bytes. */
> static inline size_t mapping_max_folio_size(const struct address_space *mapping)
> {
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7d48d4fbd5f3..79f051eb6195 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2235,8 +2235,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> int nr_none = 0;
> bool is_shmem = shmem_file(file);
>
> - VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> - VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> + /*
> + * MADV_COLLAPSE ignores shmem huge config, so do not check shmem
> + *
> + * TODO: once shmem always calls mapping_set_large_folios() on its
> + * mapping, the shmem check can be removed.
> + */
> + VM_WARN_ON_ONCE(!is_shmem && !mapping_pmd_thp_support(mapping));
When we always make shmem to set the mapping to enable large folios, can we then
drop this special casing no?
> + VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
>
> result = alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER);
> if (result != SCAN_SUCCEED)
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
2026-04-24 12:40 ` David Hildenbrand (Arm)
@ 2026-04-24 14:49 ` Zi Yan
0 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 14:49 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 24 Apr 2026, at 8:40, David Hildenbrand (Arm) wrote:
> On 4/24/26 04:49, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
>> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>>
>> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>>
>> Add a helper function mapping_pmd_thp_support() for FSes supporting large
>> folio with at least PMD_ORDER.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/pagemap.h | 9 +++++++++
>> mm/khugepaged.c | 10 ++++++++--
>> 2 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 31a848485ad9..5b4313d91137 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -513,6 +513,15 @@ static inline bool mapping_large_folio_support(const struct address_space *mappi
>> return mapping_max_folio_order(mapping) > 0;
>> }
>>
>> +static inline bool mapping_pmd_thp_support(const struct address_space *mapping)
>> +{
>> + /* AS_FOLIO_ORDER is only reasonable for pagecache folios */
>> + VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON);
>> +
>> + return mapping_max_folio_order(mapping) >= PMD_ORDER;
>> +}
>
> Given mapping_large_folio_support(), I wonder whether we should call that
> mapping_pmd_folio_support() ?
OK, will rename it.
>
>> +
>> +
>> /* Return the maximum folio size for this pagecache mapping, in bytes. */
>> static inline size_t mapping_max_folio_size(const struct address_space *mapping)
>> {
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 7d48d4fbd5f3..79f051eb6195 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2235,8 +2235,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>> int nr_none = 0;
>> bool is_shmem = shmem_file(file);
>>
>> - VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>> - VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>> + /*
>> + * MADV_COLLAPSE ignores shmem huge config, so do not check shmem
>> + *
>> + * TODO: once shmem always calls mapping_set_large_folios() on its
>> + * mapping, the shmem check can be removed.
>> + */
>> + VM_WARN_ON_ONCE(!is_shmem && !mapping_pmd_thp_support(mapping));
>
>
> When we always make shmem to set the mapping to enable large folios, can we then
> drop this special casing no?
Yes, I added the TODO above it to remind us. I have not changed the code
here to avoid a dependency on Baolin’s patch[1]. I will do that once
Baolin’s patch gets into mm-stable.
[1] https://lore.kernel.org/all/b2c7deee259a94b0d00a7c320d8d24d2c421f761.1776908112.git.baolin.wang@linux.alibaba.com/
>
>> + VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
>>
>> result = alloc_charge_folio(&new_folio, mm, cc, HPAGE_PMD_ORDER);
>> if (result != SCAN_SUCCEED)
>
>
> --
> Cheers,
>
> David
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
2026-04-24 2:49 ` [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-24 12:40 ` David Hildenbrand (Arm)
@ 2026-04-25 22:01 ` Andrew Morton
2026-04-25 22:06 ` Andrew Morton
1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2026-04-25 22:01 UTC (permalink / raw)
To: Zi Yan
Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
Alexander Viro, Christian Brauner, Jan Kara, David Hildenbrand,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On Thu, 23 Apr 2026 22:49:04 -0400 Zi Yan <ziy@nvidia.com> wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>
> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>
> Add a helper function mapping_pmd_thp_support() for FSes supporting large
> folio with at least PMD_ORDER.
My arm allnoconfig blew up.
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -513,6 +513,15 @@ static inline bool mapping_large_folio_support(const struct address_space *mappi
> return mapping_max_folio_order(mapping) > 0;
> }
>
> +static inline bool mapping_pmd_thp_support(const struct address_space *mapping)
> +{
> + /* AS_FOLIO_ORDER is only reasonable for pagecache folios */
> + VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON);
> +
> + return mapping_max_folio_order(mapping) >= PMD_ORDER;
> +}
> +
> +
- Should it be named "mapping_pmd_thp_supported"?
- What does it actually *do*? A little comment explaining this would
be appropriate.
- Is it appropriate that this be inlined?
I added the below.
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-khugepaged-remove-read_only_thp_for_fs-check-fix
Date: Sat Apr 25 02:54:04 PM PDT 2026
fix arm64 allnoconfig by uninlining mapping_pmd_thp_support()
In file included from ./include/linux/mm.h:31,
from fs/inode.c:9:
./include/linux/pagemap.h: In function 'mapping_pmd_thp_support':
./include/linux/pgtable.h:8:26: error: 'PMD_SHIFT' undeclared (first use in this function); did you mean 'PUD_SHIFT'?
8 | #define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT)
| ^~~~~~~~~
./include/linux/pagemap.h:521:52: note: in expansion of macro 'PMD_ORDER'
521 | return mapping_max_folio_order(mapping) >= PMD_ORDER;
| ^~~~~~~~~
./include/linux/pgtable.h:8:26: note: each undeclared identifier is reported only once for each function it appears in
8 | #define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT)
| ^~~~~~~~~
./include/linux/pagemap.h:521:52: note: in expansion of macro 'PMD_ORDER'
521 | return mapping_max_folio_order(mapping) >= PMD_ORDER;
| ^~~~~~~~~
make[3]: *** [scripts/Makefile.build:289: fs/inode.o] Error 1
make[2]: *** [scripts/Makefile.build:548: fs] Error 2
make[1]: *** [/usr/src/25/Makefile:2139: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/pagemap.h | 9 +--------
mm/filemap.c | 8 ++++++++
2 files changed, 9 insertions(+), 8 deletions(-)
--- a/include/linux/pagemap.h~mm-khugepaged-remove-read_only_thp_for_fs-check-fix
+++ a/include/linux/pagemap.h
@@ -513,14 +513,7 @@ static inline bool mapping_large_folio_s
return mapping_max_folio_order(mapping) > 0;
}
-static inline bool mapping_pmd_thp_support(const struct address_space *mapping)
-{
- /* AS_FOLIO_ORDER is only reasonable for pagecache folios */
- VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON);
-
- return mapping_max_folio_order(mapping) >= PMD_ORDER;
-}
-
+bool mapping_pmd_thp_support(const struct address_space *mapping);
/* Return the maximum folio size for this pagecache mapping, in bytes. */
static inline size_t mapping_max_folio_size(const struct address_space *mapping)
--- a/mm/filemap.c~mm-khugepaged-remove-read_only_thp_for_fs-check-fix
+++ a/mm/filemap.c
@@ -126,6 +126,14 @@
* ->private_lock (zap_pte_range->block_dirty_folio)
*/
+bool mapping_pmd_thp_support(const struct address_space *mapping)
+{
+ /* AS_FOLIO_ORDER is only reasonable for pagecache folios */
+ VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON);
+
+ return mapping_max_folio_order(mapping) >= PMD_ORDER;
+}
+
static void page_cache_delete(struct address_space *mapping,
struct folio *folio, void *shadow)
{
_
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
2026-04-25 22:01 ` Andrew Morton
@ 2026-04-25 22:06 ` Andrew Morton
2026-04-25 23:44 ` Zi Yan
0 siblings, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2026-04-25 22:06 UTC (permalink / raw)
To: Zi Yan, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
David Hildenbrand, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On Sat, 25 Apr 2026 15:01:31 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> I added the below.
Nope, still doesn't work. I guess 32-bit arm doesn't have a PMD_SHIFT.
The series has a few todo notes:
https://lore.kernel.org/f416ea70-fb70-434f-8807-6638c9787b57@kernel.org
https://sashiko.dev/#/patchset/20260413192030.3275825-1-ziy%40nvidia.com
https://lore.kernel.org/DDB6CEDD-116D-48C1-B558-D64CA2F58F55@nvidia.com
https://lore.kernel.org/896b3956-3cdc-48b5-a0a0-a3ec35aefa07@kernel.org
so I'll drop this version.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
2026-04-25 22:06 ` Andrew Morton
@ 2026-04-25 23:44 ` Zi Yan
0 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-25 23:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
Alexander Viro, Christian Brauner, Jan Kara, David Hildenbrand,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 25 Apr 2026, at 18:06, Andrew Morton wrote:
> On Sat, 25 Apr 2026 15:01:31 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> I added the below.
>
> Nope, still doesn't work. I guess 32-bit arm doesn't have a PMD_SHIFT.
>
> The series has a few todo notes:
>
> https://lore.kernel.org/f416ea70-fb70-434f-8807-6638c9787b57@kernel.org
> https://sashiko.dev/#/patchset/20260413192030.3275825-1-ziy%40nvidia.com
> https://lore.kernel.org/DDB6CEDD-116D-48C1-B558-D64CA2F58F55@nvidia.com
> https://lore.kernel.org/896b3956-3cdc-48b5-a0a0-a3ec35aefa07@kernel.org
>
> so I'll drop this version.
Sure. Will fix this along with other TODOs.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 7.2 v4 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-24 2:49 ` [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 12:43 ` David Hildenbrand (Arm)
2026-04-26 6:01 ` Baolin Wang
2026-04-24 2:49 ` [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
` (10 subsequent siblings)
12 siblings, 2 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
This check ensures the correctness of collapse read-only THPs for FSes
after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
PMD THP pagecache.
READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
and inode->i_writecount to prevent any write to read-only to-be-collapsed
folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
aforementioned mechanism will go away too. To ensure khugepaged functions
as expected after the changes, skip if any folio is dirty after
try_to_unmap(), since a dirty folio means this read-only folio
got some writes via mmap can happen between try_to_unmap() and
try_to_unmap_flush() via cached TLB entries and khugepaged does not support
writable pagecache folio collapse yet.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/khugepaged.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 79f051eb6195..726f8ace01af 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2327,8 +2327,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
}
} else if (folio_test_dirty(folio)) {
/*
- * khugepaged only works on read-only fd,
- * so this page is dirty because it hasn't
+ * This page is dirty because it hasn't
* been flushed since first write. There
* won't be new dirty pages.
*
@@ -2386,8 +2385,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
if (!is_shmem && (folio_test_dirty(folio) ||
folio_test_writeback(folio))) {
/*
- * khugepaged only works on read-only fd, so this
- * folio is dirty because it hasn't been flushed
+ * khugepaged only works on clean file-backed folios,
+ * so this folio is dirty because it hasn't been flushed
* since first write.
*/
result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
@@ -2431,6 +2430,27 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
goto out_unlock;
}
+ /*
+ * At this point, the folio is locked and unmapped. If the PTE
+ * was dirty, try_to_unmap() has transferred the dirty bit to
+ * the folio and we must not collapse it into a clean
+ * file-backed folio.
+ *
+ * If the folio is clean here, no one can write it until we
+ * drop the folio lock. A write through a stale TLB entry came
+ * from a clean PTE and must fault because the PTE has been
+ * cleared; the fault path has to take the folio lock before
+ * installing a writable mapping. Buffered write paths also
+ * have to take the folio lock before modifying file contents
+ * without a mapping, typically via write_begin_get_folio().
+ */
+ if (!is_shmem && folio_test_dirty(folio)) {
+ result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
+ xas_unlock_irq(&xas);
+ folio_putback_lru(folio);
+ goto out_unlock;
+ }
+
/*
* Accumulate the folios that are being collapsed.
*/
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
2026-04-24 2:49 ` [PATCH 7.2 v4 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
@ 2026-04-24 12:43 ` David Hildenbrand (Arm)
2026-04-26 6:01 ` Baolin Wang
1 sibling, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 12:43 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 04:49, Zi Yan wrote:
> This check ensures the correctness of collapse read-only THPs for FSes
> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
> PMD THP pagecache.
>
> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
> and inode->i_writecount to prevent any write to read-only to-be-collapsed
> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
> aforementioned mechanism will go away too. To ensure khugepaged functions
> as expected after the changes, skip if any folio is dirty after
> try_to_unmap(), since a dirty folio means this read-only folio
> got some writes via mmap can happen between try_to_unmap() and
> try_to_unmap_flush() via cached TLB entries and khugepaged does not support
> writable pagecache folio collapse yet.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 7.2 v4 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
2026-04-24 2:49 ` [PATCH 7.2 v4 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
2026-04-24 12:43 ` David Hildenbrand (Arm)
@ 2026-04-26 6:01 ` Baolin Wang
1 sibling, 0 replies; 32+ messages in thread
From: Baolin Wang @ 2026-04-26 6:01 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 10:49 AM, Zi Yan wrote:
> This check ensures the correctness of collapse read-only THPs for FSes
> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
> PMD THP pagecache.
>
> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
> and inode->i_writecount to prevent any write to read-only to-be-collapsed
> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
> aforementioned mechanism will go away too. To ensure khugepaged functions
> as expected after the changes, skip if any folio is dirty after
> try_to_unmap(), since a dirty folio means this read-only folio
> got some writes via mmap can happen between try_to_unmap() and
> try_to_unmap_flush() via cached TLB entries and khugepaged does not support
> writable pagecache folio collapse yet.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
LGTM (Thanks to Lance for pointing out the xas lock issue in the
previous version). Feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-24 2:49 ` [PATCH 7.2 v4 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-24 2:49 ` [PATCH 7.2 v4 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 12:43 ` David Hildenbrand (Arm)
` (2 more replies)
2026-04-24 2:49 ` [PATCH 7.2 v4 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled() Zi Yan
` (9 subsequent siblings)
12 siblings, 3 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
Replace it with a check on the max folio order of the file's address space
mapping, making sure PMD THP is supported. Also remove the read-only fd
check, since collapse_file() now makes sure all to-be-collapsed folios are
clean and the created PMD file THP can be handled by FSes properly.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/huge_memory.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1f0d0b780943..f0db1390a18f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -86,9 +86,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
{
struct inode *inode;
- if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
- return false;
-
if (!vma->vm_file)
return false;
@@ -97,7 +94,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
if (IS_ANON_FILE(inode))
return false;
- return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
+ if (!mapping_pmd_thp_support(inode->i_mapping))
+ return false;
+
+ return S_ISREG(inode->i_mode);
}
/* If returns true, we are unable to access the VMA's folios. */
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
2026-04-24 2:49 ` [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
@ 2026-04-24 12:43 ` David Hildenbrand (Arm)
2026-04-24 14:58 ` Zi Yan
2026-04-25 14:27 ` Zi Yan
2 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 12:43 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 04:49, Zi Yan wrote:
> Replace it with a check on the max folio order of the file's address space
> mapping, making sure PMD THP is supported. Also remove the read-only fd
> check, since collapse_file() now makes sure all to-be-collapsed folios are
> clean and the created PMD file THP can be handled by FSes properly.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
2026-04-24 2:49 ` [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-24 12:43 ` David Hildenbrand (Arm)
@ 2026-04-24 14:58 ` Zi Yan
2026-04-25 14:27 ` Zi Yan
2 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 14:58 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
On 23 Apr 2026, at 22:49, Zi Yan wrote:
> Replace it with a check on the max folio order of the file's address space
> mapping, making sure PMD THP is supported. Also remove the read-only fd
> check, since collapse_file() now makes sure all to-be-collapsed folios are
> clean and the created PMD file THP can be handled by FSes properly.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/huge_memory.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1f0d0b780943..f0db1390a18f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -86,9 +86,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> {
> struct inode *inode;
>
> - if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> - return false;
> -
> if (!vma->vm_file)
> return false;
>
> @@ -97,7 +94,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> if (IS_ANON_FILE(inode))
> return false;
>
> - return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> + if (!mapping_pmd_thp_support(inode->i_mapping))
> + return false;
sashiko told me that I need to check vma->vm_file->f_mapping, since
folio->mapping needs to match vma->vm_file->f_mapping and inode->i_mapping
and vma->vm_file->f_mapping can diverge.
I will fix this one.
> +
> + return S_ISREG(inode->i_mode);
> }
>
> /* If returns true, we are unable to access the VMA's folios. */
> --
> 2.43.0
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
2026-04-24 2:49 ` [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-24 12:43 ` David Hildenbrand (Arm)
2026-04-24 14:58 ` Zi Yan
@ 2026-04-25 14:27 ` Zi Yan
2 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-25 14:27 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), David Hildenbrand, Lorenzo Stoakes
Cc: Song Liu, Andrew Morton, Chris Mason, David Sterba,
Alexander Viro, Christian Brauner, Jan Kara, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
On 23 Apr 2026, at 22:49, Zi Yan wrote:
> Replace it with a check on the max folio order of the file's address space
> mapping, making sure PMD THP is supported. Also remove the read-only fd
> check, since collapse_file() now makes sure all to-be-collapsed folios are
> clean and the created PMD file THP can be handled by FSes properly.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/huge_memory.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1f0d0b780943..f0db1390a18f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -86,9 +86,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> {
> struct inode *inode;
>
> - if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> - return false;
> -
> if (!vma->vm_file)
> return false;
>
> @@ -97,7 +94,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> if (IS_ANON_FILE(inode))
> return false;
>
> - return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
(resend to V4, since I accidentally replied to V3, sorry about that)
Hi Matthew, David, and Lorenzo,
After some discussions on irc, I feel that we probably should not allow
read-write fd for PMD THP collapse at the moment. Combining with the
filemap_flush() under folio_dirty() check from collapse_file(), khugepaged
would become a kwritebackd that scans pagecache folios and writes them back.
If we limit it to read-only fds, at least khugepaged would only write
back once for the pagecache folios from these fds.
I am planning to restore inode_is_open_for_write() check in the next version.
Let me know your thoughts.
> + if (!mapping_pmd_thp_support(inode->i_mapping))
> + return false;
> +
> + return S_ISREG(inode->i_mode);
> }
>
> /* If returns true, we are unable to access the VMA's folios. */
> --
> 2.43.0
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 7.2 v4 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled()
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (2 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 12:47 ` David Hildenbrand (Arm)
2026-04-24 2:49 ` [PATCH 7.2 v4 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
` (8 subsequent siblings)
12 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
Remove READ_ONLY_THP_FOR_FS and khugepaged for file-backed pmd-sized
hugepages are enabled by the global transparent hugepage control.
khugepaged can still be enabled by per-size control for anon and shmem when
the global control is off.
Add shmem_hpage_pmd_enabled() stub for !CONFIG_SHMEM to remove
IS_ENABLED(SHMEM) in hugepage_enabled().
Clean up hugepage_enabled() by moving anon code to anon_hpage_enabled().
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/shmem_fs.h | 2 +-
mm/khugepaged.c | 26 ++++++++++++++++----------
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 93a0ba872ebe..acb8dd961b45 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -127,7 +127,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
void shmem_truncate_range(struct inode *inode, loff_t start, uoff_t end);
int shmem_unuse(unsigned int type);
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && defined(CONFIG_SHMEM)
unsigned long shmem_allowable_huge_orders(struct inode *inode,
struct vm_area_struct *vma, pgoff_t index,
loff_t write_end, bool shmem_huge_force);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 726f8ace01af..cdd4b37e4a68 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -524,26 +524,32 @@ static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
}
+static inline bool anon_hpage_enabled(void)
+{
+ if (READ_ONCE(huge_anon_orders_always))
+ return true;
+ if (READ_ONCE(huge_anon_orders_madvise))
+ return true;
+ if (READ_ONCE(huge_anon_orders_inherit) &&
+ hugepage_global_enabled())
+ return true;
+ return false;
+}
+
static bool hugepage_enabled(void)
{
/*
* We cover the anon, shmem and the file-backed case here; file-backed
- * hugepages, when configured in, are determined by the global control.
+ * hugepages are determined by the global control.
* Anon hugepages are determined by its per-size mTHP control.
* Shmem pmd-sized hugepages are also determined by its pmd-size control,
* except when the global shmem_huge is set to SHMEM_HUGE_DENY.
*/
- if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
- hugepage_global_enabled())
- return true;
- if (READ_ONCE(huge_anon_orders_always))
+ if (hugepage_global_enabled())
return true;
- if (READ_ONCE(huge_anon_orders_madvise))
- return true;
- if (READ_ONCE(huge_anon_orders_inherit) &&
- hugepage_global_enabled())
+ if (anon_hpage_enabled())
return true;
- if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled())
+ if (shmem_hpage_pmd_enabled())
return true;
return false;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled()
2026-04-24 2:49 ` [PATCH 7.2 v4 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled() Zi Yan
@ 2026-04-24 12:47 ` David Hildenbrand (Arm)
2026-04-24 14:59 ` Zi Yan
0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 12:47 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 04:49, Zi Yan wrote:
> Remove READ_ONLY_THP_FOR_FS and khugepaged for file-backed pmd-sized
> hugepages are enabled by the global transparent hugepage control.
> khugepaged can still be enabled by per-size control for anon and shmem when
> the global control is off.
>
> Add shmem_hpage_pmd_enabled() stub for !CONFIG_SHMEM to remove
> IS_ENABLED(SHMEM) in hugepage_enabled().
>
> Clean up hugepage_enabled() by moving anon code to anon_hpage_enabled().
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/shmem_fs.h | 2 +-
> mm/khugepaged.c | 26 ++++++++++++++++----------
> 2 files changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 93a0ba872ebe..acb8dd961b45 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -127,7 +127,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
> void shmem_truncate_range(struct inode *inode, loff_t start, uoff_t end);
> int shmem_unuse(unsigned int type);
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && defined(CONFIG_SHMEM)
> unsigned long shmem_allowable_huge_orders(struct inode *inode,
> struct vm_area_struct *vma, pgoff_t index,
> loff_t write_end, bool shmem_huge_force);
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 726f8ace01af..cdd4b37e4a68 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -524,26 +524,32 @@ static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
> mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
> }
>
> +static inline bool anon_hpage_enabled(void)
> +{
> + if (READ_ONCE(huge_anon_orders_always))
> + return true;
> + if (READ_ONCE(huge_anon_orders_madvise))
> + return true;
> + if (READ_ONCE(huge_anon_orders_inherit) &&
> + hugepage_global_enabled())
> + return true;
> + return false;
> +}
Ah, that is based on Nicos work, right?
> +
> static bool hugepage_enabled(void)
> {
> /*
> * We cover the anon, shmem and the file-backed case here; file-backed
> - * hugepages, when configured in, are determined by the global control.
> + * hugepages are determined by the global control.
> * Anon hugepages are determined by its per-size mTHP control.
> * Shmem pmd-sized hugepages are also determined by its pmd-size control,
> * except when the global shmem_huge is set to SHMEM_HUGE_DENY.
> */
> - if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> - hugepage_global_enabled())
> - return true;
> - if (READ_ONCE(huge_anon_orders_always))
> + if (hugepage_global_enabled())
> return true;
> - if (READ_ONCE(huge_anon_orders_madvise))
> - return true;
> - if (READ_ONCE(huge_anon_orders_inherit) &&
> - hugepage_global_enabled())
> + if (anon_hpage_enabled())
> return true;
> - if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled())
> + if (shmem_hpage_pmd_enabled())
> return true;
> return false;
> }
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled()
2026-04-24 12:47 ` David Hildenbrand (Arm)
@ 2026-04-24 14:59 ` Zi Yan
0 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 14:59 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 24 Apr 2026, at 8:47, David Hildenbrand (Arm) wrote:
> On 4/24/26 04:49, Zi Yan wrote:
>> Remove READ_ONLY_THP_FOR_FS and khugepaged for file-backed pmd-sized
>> hugepages are enabled by the global transparent hugepage control.
>> khugepaged can still be enabled by per-size control for anon and shmem when
>> the global control is off.
>>
>> Add shmem_hpage_pmd_enabled() stub for !CONFIG_SHMEM to remove
>> IS_ENABLED(SHMEM) in hugepage_enabled().
>>
>> Clean up hugepage_enabled() by moving anon code to anon_hpage_enabled().
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/shmem_fs.h | 2 +-
>> mm/khugepaged.c | 26 ++++++++++++++++----------
>> 2 files changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
>> index 93a0ba872ebe..acb8dd961b45 100644
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -127,7 +127,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
>> void shmem_truncate_range(struct inode *inode, loff_t start, uoff_t end);
>> int shmem_unuse(unsigned int type);
>>
>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && defined(CONFIG_SHMEM)
>> unsigned long shmem_allowable_huge_orders(struct inode *inode,
>> struct vm_area_struct *vma, pgoff_t index,
>> loff_t write_end, bool shmem_huge_force);
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 726f8ace01af..cdd4b37e4a68 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -524,26 +524,32 @@ static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
>> mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
>> }
>>
>> +static inline bool anon_hpage_enabled(void)
>> +{
>> + if (READ_ONCE(huge_anon_orders_always))
>> + return true;
>> + if (READ_ONCE(huge_anon_orders_madvise))
>> + return true;
>> + if (READ_ONCE(huge_anon_orders_inherit) &&
>> + hugepage_global_enabled())
>> + return true;
>> + return false;
>> +}
>
> Ah, that is based on Nicos work, right?
Yes, since mm-new has Nico’s patchset now.
>
>> +
>> static bool hugepage_enabled(void)
>> {
>> /*
>> * We cover the anon, shmem and the file-backed case here; file-backed
>> - * hugepages, when configured in, are determined by the global control.
>> + * hugepages are determined by the global control.
>> * Anon hugepages are determined by its per-size mTHP control.
>> * Shmem pmd-sized hugepages are also determined by its pmd-size control,
>> * except when the global shmem_huge is set to SHMEM_HUGE_DENY.
>> */
>> - if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>> - hugepage_global_enabled())
>> - return true;
>> - if (READ_ONCE(huge_anon_orders_always))
>> + if (hugepage_global_enabled())
>> return true;
>> - if (READ_ONCE(huge_anon_orders_madvise))
>> - return true;
>> - if (READ_ONCE(huge_anon_orders_inherit) &&
>> - hugepage_global_enabled())
>> + if (anon_hpage_enabled())
>> return true;
>> - if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled())
>> + if (shmem_hpage_pmd_enabled())
>> return true;
>> return false;
>> }
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>
> --
> Cheers,
>
> David
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 7.2 v4 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (3 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled() Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 2:49 ` [PATCH 7.2 v4 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
` (7 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
After removing READ_ONLY_THP_FOR_FS check in file_thp_enabled(),
khugepaged and MADV_COLLAPSE can run on FSes with PMD THP pagecache
support even without READ_ONLY_THP_FOR_FS enabled. Remove the Kconfig first
so that no one can use READ_ONLY_THP_FOR_FS as upcoming commits remove
mapping->nr_thps, which its safe guard mechanism relies on.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/Kconfig | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index 0a43bb80df4f..fcb6ebde7e29 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -938,17 +938,6 @@ config THP_SWAP
For selection by architectures with reasonable THP sizes.
-config READ_ONLY_THP_FOR_FS
- bool "Read-only THP for filesystems (EXPERIMENTAL)"
- depends on TRANSPARENT_HUGEPAGE
-
- help
- Allow khugepaged to put read-only file-backed pages in THP.
-
- This is marked experimental because it is a new feature. Write
- support of file THPs will be developed in the next few release
- cycles.
-
config NO_PAGE_MAPCOUNT
bool "No per-page mapcount (EXPERIMENTAL)"
help
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH 7.2 v4 06/12] mm: fs: remove filemap_nr_thps*() functions and their users
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (4 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 2:49 ` [PATCH 7.2 v4 07/12] fs: remove nr_thps from struct address_space Zi Yan
` (6 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
large folio support, so that read-only THPs created in these FSes are not
seen by the FSes when the underlying fd becomes writable. Now read-only PMD
THPs only appear in a FS with large folio support and the supported orders
include PMD_ORDRE.
READ_ONLY_THP_FOR_FS was using mapping->nr_thps, inode->i_writecount, and
smp_mb() to prevent writes to a read-only THP and collapsing writable
folios into a THP. In collapse_file(), mapping->nr_thps is increased, then
smp_mb(), and if inode->i_writecount > 0, collapse is stopped, while
do_dentry_open() first increases inode->i_writecount, then a full memory
fence, and if mapping->nr_thps > 0, all read-only THPs are truncated.
Now this mechanism can be removed along with READ_ONLY_THP_FOR_FS code,
since a dirty folio check has been added after try_to_unmap() in
collapse_file() to make sure no writable folio can be collapsed.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
fs/open.c | 27 ---------------------------
include/linux/pagemap.h | 29 -----------------------------
mm/filemap.c | 1 -
mm/huge_memory.c | 1 -
mm/khugepaged.c | 28 ----------------------------
5 files changed, 86 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 681d405bc61e..c321b80027f1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -968,33 +968,6 @@ static int do_dentry_open(struct file *f,
if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT))
return -EINVAL;
- /*
- * XXX: Huge page cache doesn't support writing yet. Drop all page
- * cache for this file before processing writes.
- */
- if (f->f_mode & FMODE_WRITE) {
- /*
- * Depends on full fence from get_write_access() to synchronize
- * against collapse_file() regarding i_writecount and nr_thps
- * updates. Ensures subsequent insertion of THPs into the page
- * cache will fail.
- */
- if (filemap_nr_thps(inode->i_mapping)) {
- struct address_space *mapping = inode->i_mapping;
-
- filemap_invalidate_lock(inode->i_mapping);
- /*
- * unmap_mapping_range just need to be called once
- * here, because the private pages is not need to be
- * unmapped mapping (e.g. data segment of dynamic
- * shared libraries here).
- */
- unmap_mapping_range(mapping, 0, 0, 0);
- truncate_inode_pages(mapping, 0);
- filemap_invalidate_unlock(inode->i_mapping);
- }
- }
-
return 0;
cleanup_all:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 5b4313d91137..88e58ca79bb5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -528,35 +528,6 @@ static inline size_t mapping_max_folio_size(const struct address_space *mapping)
return PAGE_SIZE << mapping_max_folio_order(mapping);
}
-static inline int filemap_nr_thps(const struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- return atomic_read(&mapping->nr_thps);
-#else
- return 0;
-#endif
-}
-
-static inline void filemap_nr_thps_inc(struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- if (!mapping_large_folio_support(mapping))
- atomic_inc(&mapping->nr_thps);
-#else
- WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
-#endif
-}
-
-static inline void filemap_nr_thps_dec(struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- if (!mapping_large_folio_support(mapping))
- atomic_dec(&mapping->nr_thps);
-#else
- WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
-#endif
-}
-
struct address_space *folio_mapping(const struct folio *folio);
/**
diff --git a/mm/filemap.c b/mm/filemap.c
index 4e636647100c..d3cd4d2f3734 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -189,7 +189,6 @@ static void filemap_unaccount_folio(struct address_space *mapping,
lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
} else if (folio_test_pmd_mappable(folio)) {
lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
- filemap_nr_thps_dec(mapping);
}
if (test_bit(AS_KERNEL_FILE, &folio->mapping->flags))
mod_node_page_state(folio_pgdat(folio),
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f0db1390a18f..8b85a3e58b00 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3937,7 +3937,6 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
} else {
lruvec_stat_mod_folio(folio,
NR_FILE_THPS, -nr);
- filemap_nr_thps_dec(mapping);
}
}
}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index cdd4b37e4a68..4de7e30c4b71 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2469,21 +2469,6 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
goto xa_unlocked;
}
- if (!is_shmem) {
- filemap_nr_thps_inc(mapping);
- /*
- * Paired with the fence in do_dentry_open() -> get_write_access()
- * to ensure i_writecount is up to date and the update to nr_thps
- * is visible. Ensures the page cache will be truncated if the
- * file is opened writable.
- */
- smp_mb();
- if (inode_is_open_for_write(mapping->host)) {
- result = SCAN_FAIL;
- filemap_nr_thps_dec(mapping);
- }
- }
-
xa_locked:
xas_unlock_irq(&xas);
xa_unlocked:
@@ -2661,19 +2646,6 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
folio_putback_lru(folio);
folio_put(folio);
}
- /*
- * Undo the updates of filemap_nr_thps_inc for non-SHMEM
- * file only. This undo is not needed unless failure is
- * due to SCAN_COPY_MC.
- */
- if (!is_shmem && result == SCAN_COPY_MC) {
- filemap_nr_thps_dec(mapping);
- /*
- * Paired with the fence in do_dentry_open() -> get_write_access()
- * to ensure the update to nr_thps is visible.
- */
- smp_mb();
- }
new_folio->mapping = NULL;
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH 7.2 v4 07/12] fs: remove nr_thps from struct address_space
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (5 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 2:49 ` [PATCH 7.2 v4 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
` (5 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
filemap_nr_thps*() are removed, the related field, address_space->nr_thps,
is no longer needed. Remove it. This shrinks struct address_space by 8
bytes on 64-bit systems which may increase the number of inodes we can
cache.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
fs/inode.c | 3 ---
include/linux/fs.h | 5 -----
2 files changed, 8 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 69e219f0cfcb..35399f60718e 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -279,9 +279,6 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
mapping->flags = 0;
mapping->wb_err = 0;
atomic_set(&mapping->i_mmap_writable, 0);
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- atomic_set(&mapping->nr_thps, 0);
-#endif
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
mapping->writeback_index = 0;
init_rwsem(&mapping->invalidate_lock);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 11559c513dfb..bb9cc4f7207c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -460,7 +460,6 @@ struct mapping_metadata_bhs {
* memory mappings.
* @gfp_mask: Memory allocation flags to use for allocating pages.
* @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
- * @nr_thps: Number of THPs in the pagecache (non-shmem only).
* @i_mmap: Tree of private and shared mappings.
* @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
* @nrpages: Number of page entries, protected by the i_pages lock.
@@ -476,10 +475,6 @@ struct address_space {
struct rw_semaphore invalidate_lock;
gfp_t gfp_mask;
atomic_t i_mmap_writable;
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- /* number of thp, only for non-shmem files */
- atomic_t nr_thps;
-#endif
struct rb_root_cached i_mmap;
unsigned long nrpages;
pgoff_t writeback_index;
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH 7.2 v4 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (6 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 07/12] fs: remove nr_thps from struct address_space Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 12:48 ` David Hildenbrand (Arm)
2026-04-24 2:49 ` [PATCH 7.2 v4 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
` (4 subsequent siblings)
12 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
Without READ_ONLY_THP_FOR_FS, large file-backed folios cannot be created by
a FS without large folio support. The check is no longer needed.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/huge_memory.c | 30 +++---------------------------
1 file changed, 3 insertions(+), 27 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8b85a3e58b00..a76ddc63195a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3832,33 +3832,9 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
if (!folio->mapping && !folio_test_anon(folio))
return -EBUSY;
- if (folio_test_anon(folio)) {
- /* order-1 is not supported for anonymous THP. */
- if (new_order == 1)
- return -EINVAL;
- } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
- if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
- !mapping_large_folio_support(folio->mapping)) {
- /*
- * We can always split a folio down to a single page
- * (new_order == 0) uniformly.
- *
- * For any other scenario
- * a) uniform split targeting a large folio
- * (new_order > 0)
- * b) any non-uniform split
- * we must confirm that the file system supports large
- * folios.
- *
- * Note that we might still have THPs in such
- * mappings, which is created from khugepaged when
- * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
- * case, the mapping does not actually support large
- * folios properly.
- */
- return -EINVAL;
- }
- }
+ /* order-1 is not supported for anonymous THP. */
+ if (folio_test_anon(folio) && new_order == 1)
+ return -EINVAL;
/*
* swapcache folio could only be split to order 0
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
2026-04-24 2:49 ` [PATCH 7.2 v4 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-24 12:48 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 12:48 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 04:49, Zi Yan wrote:
> Without READ_ONLY_THP_FOR_FS, large file-backed folios cannot be created by
> a FS without large folio support. The check is no longer needed.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 7.2 v4 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio()
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (7 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 12:54 ` David Hildenbrand (Arm)
2026-04-24 2:49 ` [PATCH 7.2 v4 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
` (3 subsequent siblings)
12 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
not. folio_split() can be used on a FS with large folio support without
worrying about getting a THP on a FS without large folio support.
When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can
appear in a FS without large folio support after khugepaged or
madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(),
such a PMD large pagecache folio is split and if the FS does not support
large folio, it needs to be split to order-0 ones and could not be split
non uniformly to ones with various orders. try_folio_split_to_order() was
added to handle this situation by checking folio_check_splittable(...,
SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to
READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now
READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created
with FSes supporting large folio, this function is no longer needed and all
large pagecache folios can be split non uniformly.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
include/linux/huge_mm.h | 25 ++-----------------------
mm/truncate.c | 8 ++++----
2 files changed, 6 insertions(+), 27 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 48496f09909b..127f9e1e7604 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -394,27 +394,6 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
return split_huge_page_to_list_to_order(page, NULL, new_order);
}
-/**
- * try_folio_split_to_order() - try to split a @folio at @page to @new_order
- * using non uniform split.
- * @folio: folio to be split
- * @page: split to @new_order at the given page
- * @new_order: the target split order
- *
- * Try to split a @folio at @page using non uniform split to @new_order, if
- * non uniform split is not supported, fall back to uniform split. After-split
- * folios are put back to LRU list. Use min_order_for_split() to get the lower
- * bound of @new_order.
- *
- * Return: 0 - split is successful, otherwise split failed.
- */
-static inline int try_folio_split_to_order(struct folio *folio,
- struct page *page, unsigned int new_order)
-{
- if (folio_check_splittable(folio, new_order, SPLIT_TYPE_NON_UNIFORM))
- return split_huge_page_to_order(&folio->page, new_order);
- return folio_split(folio, new_order, page, NULL);
-}
static inline int split_huge_page(struct page *page)
{
return split_huge_page_to_list_to_order(page, NULL, 0);
@@ -647,8 +626,8 @@ static inline int split_folio_to_list(struct folio *folio, struct list_head *lis
return -EINVAL;
}
-static inline int try_folio_split_to_order(struct folio *folio,
- struct page *page, unsigned int new_order)
+static inline int folio_split(struct folio *folio, unsigned int new_order,
+ struct page *page, struct list_head *list)
{
VM_WARN_ON_ONCE_FOLIO(1, folio);
return -EINVAL;
diff --git a/mm/truncate.c b/mm/truncate.c
index 12cc89f89afc..b58ba940be47 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -177,7 +177,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
return 0;
}
-static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
+static int folio_split_or_unmap(struct folio *folio, struct page *split_at,
unsigned long min_order)
{
enum ttu_flags ttu_flags =
@@ -186,7 +186,7 @@ static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
TTU_IGNORE_MLOCK;
int ret;
- ret = try_folio_split_to_order(folio, split_at, min_order);
+ ret = folio_split(folio, min_order, split_at, NULL);
/*
* If the split fails, unmap the folio, so it will be refaulted
@@ -252,7 +252,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
min_order = mapping_min_folio_order(folio->mapping);
split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
- if (!try_folio_split_or_unmap(folio, split_at, min_order)) {
+ if (!folio_split_or_unmap(folio, split_at, min_order)) {
/*
* try to split at offset + length to make sure folios within
* the range can be dropped, especially to avoid memory waste
@@ -279,7 +279,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
/* make sure folio2 is large and does not change its mapping */
if (folio_test_large(folio2) &&
folio2->mapping == folio->mapping)
- try_folio_split_or_unmap(folio2, split_at2, min_order);
+ folio_split_or_unmap(folio2, split_at2, min_order);
folio_unlock(folio2);
out:
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio()
2026-04-24 2:49 ` [PATCH 7.2 v4 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
@ 2026-04-24 12:54 ` David Hildenbrand (Arm)
2026-04-24 15:07 ` Zi Yan
0 siblings, 1 reply; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 12:54 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 04:49, Zi Yan wrote:
> After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
> not. folio_split() can be used on a FS with large folio support without
> worrying about getting a THP on a FS without large folio support.
>
> When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can
> appear in a FS without large folio support after khugepaged or
> madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(),
> such a PMD large pagecache folio is split and if the FS does not support
> large folio, it needs to be split to order-0 ones and could not be split
> non uniformly to ones with various orders. try_folio_split_to_order() was
> added to handle this situation by checking folio_check_splittable(...,
> SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to
> READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now
> READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created
> with FSes supporting large folio, this function is no longer needed and all
> large pagecache folios can be split non uniformly.
In general looks good. Just one question:
folio_check_splittable() also rejects SPLIT_TYPE_NON_UNIFORM / new_order for
swapcache pages.
It's confusing, but truncate would never stumble over such folios, correct?
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 7.2 v4 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio()
2026-04-24 12:54 ` David Hildenbrand (Arm)
@ 2026-04-24 15:07 ` Zi Yan
2026-04-24 15:12 ` Zi Yan
0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2026-04-24 15:07 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 24 Apr 2026, at 8:54, David Hildenbrand (Arm) wrote:
> On 4/24/26 04:49, Zi Yan wrote:
>> After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
>> not. folio_split() can be used on a FS with large folio support without
>> worrying about getting a THP on a FS without large folio support.
>>
>> When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can
>> appear in a FS without large folio support after khugepaged or
>> madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(),
>> such a PMD large pagecache folio is split and if the FS does not support
>> large folio, it needs to be split to order-0 ones and could not be split
>> non uniformly to ones with various orders. try_folio_split_to_order() was
>> added to handle this situation by checking folio_check_splittable(...,
>> SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to
>> READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now
>> READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created
>> with FSes supporting large folio, this function is no longer needed and all
>> large pagecache folios can be split non uniformly.
>
> In general looks good. Just one question:
>
> folio_check_splittable() also rejects SPLIT_TYPE_NON_UNIFORM / new_order for
> swapcache pages.
Since swapcache only supports PMD order or 0 order yet.
>
> It's confusing, but truncate would never stumble over such folios, correct?
Right.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 7.2 v4 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio()
2026-04-24 15:07 ` Zi Yan
@ 2026-04-24 15:12 ` Zi Yan
2026-04-24 18:38 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2026-04-24 15:12 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 24 Apr 2026, at 11:07, Zi Yan wrote:
> On 24 Apr 2026, at 8:54, David Hildenbrand (Arm) wrote:
>
>> On 4/24/26 04:49, Zi Yan wrote:
>>> After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
>>> not. folio_split() can be used on a FS with large folio support without
>>> worrying about getting a THP on a FS without large folio support.
>>>
>>> When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can
>>> appear in a FS without large folio support after khugepaged or
>>> madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(),
>>> such a PMD large pagecache folio is split and if the FS does not support
>>> large folio, it needs to be split to order-0 ones and could not be split
>>> non uniformly to ones with various orders. try_folio_split_to_order() was
>>> added to handle this situation by checking folio_check_splittable(...,
>>> SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to
>>> READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now
>>> READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created
>>> with FSes supporting large folio, this function is no longer needed and all
>>> large pagecache folios can be split non uniformly.
>>
>> In general looks good. Just one question:
>>
>> folio_check_splittable() also rejects SPLIT_TYPE_NON_UNIFORM / new_order for
>> swapcache pages.
>
> Since swapcache only supports PMD order or 0 order yet.
>
>>
>> It's confusing, but truncate would never stumble over such folios, correct?
>
> Right.
BTW, sashiko had a similar question when it reviewed v2[1]. It also complained
about when the split fails, the entire folio remains in memory. But that is the
consequence of failed split, unless we want to keep retrying the split.
[1] https://sashiko.dev/#/patchset/20260413192030.3275825-1-ziy%40nvidia.com?part=9
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 7.2 v4 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio()
2026-04-24 15:12 ` Zi Yan
@ 2026-04-24 18:38 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 18:38 UTC (permalink / raw)
To: Zi Yan
Cc: Andrew Morton, Matthew Wilcox (Oracle), Song Liu, Chris Mason,
David Sterba, Alexander Viro, Christian Brauner, Jan Kara,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 17:12, Zi Yan wrote:
> On 24 Apr 2026, at 11:07, Zi Yan wrote:
>
>> On 24 Apr 2026, at 8:54, David Hildenbrand (Arm) wrote:
>>
>>>
>>> In general looks good. Just one question:
>>>
>>> folio_check_splittable() also rejects SPLIT_TYPE_NON_UNIFORM / new_order for
>>> swapcache pages.
>>
>> Since swapcache only supports PMD order or 0 order yet.
>>
>>>
>>> It's confusing, but truncate would never stumble over such folios, correct?
>>
>> Right.
>
> BTW, sashiko had a similar question when it reviewed v2[1]. It also complained
> about when the split fails, the entire folio remains in memory. But that is the
> consequence of failed split, unless we want to keep retrying the split.
Retry isn't guaranteed to make progress, so it's the current expected behavior.
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 7.2 v4 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (8 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 2:49 ` [PATCH 7.2 v4 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
` (2 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
READ_ONLY_THP_FOR_FS is no longer present, remove related comment.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/defrag.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index 7e2db5d3a4d4..a8d49d9ca981 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -860,9 +860,6 @@ static struct folio *defrag_prepare_one_folio(struct btrfs_inode *inode, pgoff_t
return folio;
/*
- * Since we can defragment files opened read-only, we can encounter
- * transparent huge pages here (see CONFIG_READ_ONLY_THP_FOR_FS).
- *
* The IO for such large folios is not fully tested, thus return
* an error to reject such folios unless it's an experimental build.
*
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH 7.2 v4 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (9 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 2:49 ` [PATCH 7.2 v4 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-24 10:30 ` [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Andrew Morton
12 siblings, 0 replies; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
Change the requirement to a file system with large folio support and the
supported order needs to include PMD_ORDER.
Also add tests of opening a file with read write permission and populating
folios with writes. Reuse the XFS image from split_huge_page_test.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
tools/testing/selftests/mm/khugepaged.c | 110 ++++++++++++++++------
tools/testing/selftests/mm/run_vmtests.sh | 12 ++-
2 files changed, 90 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 3fe7ef04ac62..627472cbc910 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -49,7 +49,8 @@ struct mem_ops {
const char *name;
};
-static struct mem_ops *file_ops;
+static struct mem_ops *read_only_file_ops;
+static struct mem_ops *read_write_file_ops;
static struct mem_ops *anon_ops;
static struct mem_ops *shmem_ops;
@@ -112,7 +113,8 @@ static void restore_settings(int sig)
static void save_settings(void)
{
printf("Save THP and khugepaged settings...");
- if (file_ops && finfo.type == VMA_FILE)
+ if ((read_only_file_ops || read_write_file_ops) &&
+ finfo.type == VMA_FILE)
thp_set_read_ahead_path(finfo.dev_queue_read_ahead_path);
thp_save_settings();
@@ -364,11 +366,14 @@ static bool anon_check_huge(void *addr, int nr_hpages)
return check_huge_anon(addr, nr_hpages, hpage_pmd_size);
}
-static void *file_setup_area(int nr_hpages)
+static void *file_setup_area_common(int nr_hpages, bool read_only)
{
int fd;
void *p;
unsigned long size;
+ int open_opt = read_only ? O_RDONLY : O_RDWR;
+ int mmap_prot = read_only ? PROT_READ : (PROT_READ | PROT_WRITE);
+ int mmap_opt = read_only ? MAP_PRIVATE : MAP_SHARED;
unlink(finfo.path); /* Cleanup from previous failed tests */
printf("Creating %s for collapse%s...", finfo.path,
@@ -388,14 +393,15 @@ static void *file_setup_area(int nr_hpages)
munmap(p, size);
success("OK");
- printf("Opening %s read only for collapse...", finfo.path);
- finfo.fd = open(finfo.path, O_RDONLY, 777);
+ printf("Opening %s %s for collapse...", finfo.path,
+ read_only ? "read only" : "read-write");
+ finfo.fd = open(finfo.path, open_opt, 777);
if (finfo.fd < 0) {
perror("open()");
exit(EXIT_FAILURE);
}
- p = mmap(BASE_ADDR, size, PROT_READ,
- MAP_PRIVATE, finfo.fd, 0);
+ p = mmap(BASE_ADDR, size, mmap_prot,
+ mmap_opt, finfo.fd, 0);
if (p == MAP_FAILED || p != BASE_ADDR) {
perror("mmap()");
exit(EXIT_FAILURE);
@@ -407,6 +413,15 @@ static void *file_setup_area(int nr_hpages)
return p;
}
+static void *file_setup_read_only_area(int nr_hpages)
+{
+ return file_setup_area_common(nr_hpages, /* read_only= */ true);
+}
+
+static void *file_setup_read_write_area(int nr_hpages)
+{
+ return file_setup_area_common(nr_hpages, /* read_only= */ false);
+}
static void file_cleanup_area(void *p, unsigned long size)
{
munmap(p, size);
@@ -414,14 +429,25 @@ static void file_cleanup_area(void *p, unsigned long size)
unlink(finfo.path);
}
-static void file_fault(void *p, unsigned long start, unsigned long end)
+static void file_fault_common(void *p, unsigned long start, unsigned long end,
+ int madv_ops)
{
- if (madvise(((char *)p) + start, end - start, MADV_POPULATE_READ)) {
+ if (madvise(((char *)p) + start, end - start, madv_ops)) {
perror("madvise(MADV_POPULATE_READ");
exit(EXIT_FAILURE);
}
}
+static void file_fault_read(void *p, unsigned long start, unsigned long end)
+{
+ file_fault_common(p, start, end, MADV_POPULATE_READ);
+}
+
+static void file_fault_write(void *p, unsigned long start, unsigned long end)
+{
+ file_fault_common(p, start, end, MADV_POPULATE_WRITE);
+}
+
static bool file_check_huge(void *addr, int nr_hpages)
{
switch (finfo.type) {
@@ -477,10 +503,18 @@ static struct mem_ops __anon_ops = {
.name = "anon",
};
-static struct mem_ops __file_ops = {
- .setup_area = &file_setup_area,
+static struct mem_ops __read_only_file_ops = {
+ .setup_area = &file_setup_read_only_area,
+ .cleanup_area = &file_cleanup_area,
+ .fault = &file_fault_read,
+ .check_huge = &file_check_huge,
+ .name = "file",
+};
+
+static struct mem_ops __read_write_file_ops = {
+ .setup_area = &file_setup_read_write_area,
.cleanup_area = &file_cleanup_area,
- .fault = &file_fault,
+ .fault = &file_fault_write,
.check_huge = &file_check_huge,
.name = "file",
};
@@ -603,7 +637,9 @@ static struct collapse_context __madvise_context = {
static bool is_tmpfs(struct mem_ops *ops)
{
- return ops == &__file_ops && finfo.type == VMA_SHMEM;
+ return (ops == &__read_only_file_ops ||
+ ops == &__read_write_file_ops) &&
+ finfo.type == VMA_SHMEM;
}
static bool is_anon(struct mem_ops *ops)
@@ -1086,8 +1122,8 @@ static void usage(void)
fprintf(stderr, "\t<context>\t: [all|khugepaged|madvise]\n");
fprintf(stderr, "\t<mem_type>\t: [all|anon|file|shmem]\n");
fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n");
- fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n");
- fprintf(stderr, "\tCONFIG_READ_ONLY_THP_FOR_FS=y\n");
+ fprintf(stderr, "\n\t\"file,all\" mem_type requires a file system\n");
+ fprintf(stderr, "\twith large folio support (order >= PMD order)\n");
fprintf(stderr, "\n\tif [dir] is a (sub)directory of a tmpfs mount, tmpfs must be\n");
fprintf(stderr, "\tmounted with huge=advise option for khugepaged tests to work\n");
fprintf(stderr, "\n\tSupported Options:\n");
@@ -1143,20 +1179,22 @@ static void parse_test_type(int argc, char **argv)
usage();
if (!strcmp(buf, "all")) {
- file_ops = &__file_ops;
+ read_only_file_ops = &__read_only_file_ops;
+ read_write_file_ops = &__read_write_file_ops;
anon_ops = &__anon_ops;
shmem_ops = &__shmem_ops;
} else if (!strcmp(buf, "anon")) {
anon_ops = &__anon_ops;
} else if (!strcmp(buf, "file")) {
- file_ops = &__file_ops;
+ read_only_file_ops = &__read_only_file_ops;
+ read_write_file_ops = &__read_write_file_ops;
} else if (!strcmp(buf, "shmem")) {
shmem_ops = &__shmem_ops;
} else {
usage();
}
- if (!file_ops)
+ if (!read_only_file_ops && !read_write_file_ops)
return;
if (argc != 2)
@@ -1228,37 +1266,47 @@ int main(int argc, char **argv)
} while (0)
TEST(collapse_full, khugepaged_context, anon_ops);
- TEST(collapse_full, khugepaged_context, file_ops);
+ TEST(collapse_full, khugepaged_context, read_only_file_ops);
+ TEST(collapse_full, khugepaged_context, read_write_file_ops);
TEST(collapse_full, khugepaged_context, shmem_ops);
TEST(collapse_full, madvise_context, anon_ops);
- TEST(collapse_full, madvise_context, file_ops);
+ TEST(collapse_full, madvise_context, read_only_file_ops);
+ TEST(collapse_full, madvise_context, read_write_file_ops);
TEST(collapse_full, madvise_context, shmem_ops);
TEST(collapse_empty, khugepaged_context, anon_ops);
TEST(collapse_empty, madvise_context, anon_ops);
TEST(collapse_single_pte_entry, khugepaged_context, anon_ops);
- TEST(collapse_single_pte_entry, khugepaged_context, file_ops);
+ TEST(collapse_single_pte_entry, khugepaged_context, read_only_file_ops);
+ TEST(collapse_single_pte_entry, khugepaged_context, read_write_file_ops);
TEST(collapse_single_pte_entry, khugepaged_context, shmem_ops);
TEST(collapse_single_pte_entry, madvise_context, anon_ops);
- TEST(collapse_single_pte_entry, madvise_context, file_ops);
+ TEST(collapse_single_pte_entry, madvise_context, read_only_file_ops);
+ TEST(collapse_single_pte_entry, madvise_context, read_write_file_ops);
TEST(collapse_single_pte_entry, madvise_context, shmem_ops);
TEST(collapse_max_ptes_none, khugepaged_context, anon_ops);
- TEST(collapse_max_ptes_none, khugepaged_context, file_ops);
+ TEST(collapse_max_ptes_none, khugepaged_context, read_only_file_ops);
+ TEST(collapse_max_ptes_none, khugepaged_context, read_write_file_ops);
TEST(collapse_max_ptes_none, madvise_context, anon_ops);
- TEST(collapse_max_ptes_none, madvise_context, file_ops);
+ TEST(collapse_max_ptes_none, madvise_context, read_only_file_ops);
+ TEST(collapse_max_ptes_none, madvise_context, read_write_file_ops);
TEST(collapse_single_pte_entry_compound, khugepaged_context, anon_ops);
- TEST(collapse_single_pte_entry_compound, khugepaged_context, file_ops);
+ TEST(collapse_single_pte_entry_compound, khugepaged_context, read_only_file_ops);
+ TEST(collapse_single_pte_entry_compound, khugepaged_context, read_write_file_ops);
TEST(collapse_single_pte_entry_compound, madvise_context, anon_ops);
- TEST(collapse_single_pte_entry_compound, madvise_context, file_ops);
+ TEST(collapse_single_pte_entry_compound, madvise_context, read_only_file_ops);
+ TEST(collapse_single_pte_entry_compound, madvise_context, read_write_file_ops);
TEST(collapse_full_of_compound, khugepaged_context, anon_ops);
- TEST(collapse_full_of_compound, khugepaged_context, file_ops);
+ TEST(collapse_full_of_compound, khugepaged_context, read_only_file_ops);
+ TEST(collapse_full_of_compound, khugepaged_context, read_write_file_ops);
TEST(collapse_full_of_compound, khugepaged_context, shmem_ops);
TEST(collapse_full_of_compound, madvise_context, anon_ops);
- TEST(collapse_full_of_compound, madvise_context, file_ops);
+ TEST(collapse_full_of_compound, madvise_context, read_only_file_ops);
+ TEST(collapse_full_of_compound, madvise_context, read_write_file_ops);
TEST(collapse_full_of_compound, madvise_context, shmem_ops);
TEST(collapse_compound_extreme, khugepaged_context, anon_ops);
@@ -1280,10 +1328,12 @@ int main(int argc, char **argv)
TEST(collapse_max_ptes_shared, madvise_context, anon_ops);
TEST(madvise_collapse_existing_thps, madvise_context, anon_ops);
- TEST(madvise_collapse_existing_thps, madvise_context, file_ops);
+ TEST(madvise_collapse_existing_thps, madvise_context, read_only_file_ops);
+ TEST(madvise_collapse_existing_thps, madvise_context, read_write_file_ops);
TEST(madvise_collapse_existing_thps, madvise_context, shmem_ops);
- TEST(madvise_retracted_page_tables, madvise_context, file_ops);
+ TEST(madvise_retracted_page_tables, madvise_context, read_only_file_ops);
+ TEST(madvise_retracted_page_tables, madvise_context, read_write_file_ops);
TEST(madvise_retracted_page_tables, madvise_context, shmem_ops);
restore_settings(0);
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index d8468451b3a3..50dd6b6d0225 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -489,8 +489,6 @@ CATEGORY="thp" run_test ./khugepaged all:shmem
CATEGORY="thp" run_test ./khugepaged -s 4 all:shmem
-CATEGORY="thp" run_test ./transhuge-stress -d 20
-
# Try to create XFS if not provided
if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then
if [ "${HAVE_HUGEPAGES}" = "1" ]; then
@@ -507,6 +505,14 @@ if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then
fi
fi
+if [ -n "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then
+CATEGORY="thp" run_test ./khugepaged all:file ${SPLIT_HUGE_PAGE_TEST_XFS_PATH}
+else
+ count_total=$(( count_total + 1 ))
+ count_skip=$(( count_skip + 1 ))
+ echo "[SKIP] ./khugepaged all:file" | tap_prefix
+fi
+
CATEGORY="thp" run_test ./split_huge_page_test ${SPLIT_HUGE_PAGE_TEST_XFS_PATH}
if [ -n "${MOUNTED_XFS}" ]; then
@@ -515,6 +521,8 @@ if [ -n "${MOUNTED_XFS}" ]; then
rm -f ${XFS_IMG}
fi
+CATEGORY="thp" run_test ./transhuge-stress -d 20
+
CATEGORY="thp" run_test ./folio_split_race_test
CATEGORY="migration" run_test ./migration
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH 7.2 v4 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (10 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
@ 2026-04-24 2:49 ` Zi Yan
2026-04-24 12:59 ` David Hildenbrand (Arm)
2026-04-24 10:30 ` [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Andrew Morton
12 siblings, 1 reply; 32+ messages in thread
From: Zi Yan @ 2026-04-24 2:49 UTC (permalink / raw)
To: Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
linux-fsdevel, linux-mm, linux-kselftest
Any file system with large folio support and the supported orders include
PMD_ORDER can be used. There is no need to open a file with read-only.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
tools/testing/selftests/mm/guard-regions.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
index 48e8b1539be3..117639891953 100644
--- a/tools/testing/selftests/mm/guard-regions.c
+++ b/tools/testing/selftests/mm/guard-regions.c
@@ -2203,17 +2203,6 @@ TEST_F(guard_regions, collapse)
if (variant->backing != ANON_BACKED)
ASSERT_EQ(ftruncate(self->fd, size), 0);
- /*
- * We must close and re-open local-file backed as read-only for
- * CONFIG_READ_ONLY_THP_FOR_FS to work.
- */
- if (variant->backing == LOCAL_FILE_BACKED) {
- ASSERT_EQ(close(self->fd), 0);
-
- self->fd = open(self->path, O_RDONLY);
- ASSERT_GE(self->fd, 0);
- }
-
ptr = mmap_(self, variant, NULL, size, PROT_READ, 0, 0);
ASSERT_NE(ptr, MAP_FAILED);
@@ -2237,9 +2226,10 @@ TEST_F(guard_regions, collapse)
/*
* Now collapse the entire region. This should fail in all cases.
*
- * The madvise() call will also fail if CONFIG_READ_ONLY_THP_FOR_FS is
- * not set for the local file case, but we can't differentiate whether
- * this occurred or if the collapse was rightly rejected.
+ * The madvise() call will also fail if the file system does not support
+ * large folio or the supported orders do not include PMD_ORDER for the
+ * local file case, but we can't differentiate whether this occurred or
+ * if the collapse was rightly rejected.
*/
EXPECT_NE(madvise(ptr, size, MADV_COLLAPSE), 0);
--
2.43.0
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 7.2 v4 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions
2026-04-24 2:49 ` [PATCH 7.2 v4 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
@ 2026-04-24 12:59 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 12:59 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On 4/24/26 04:49, Zi Yan wrote:
> Any file system with large folio support and the supported orders include
> PMD_ORDER can be used. There is no need to open a file with read-only.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> tools/testing/selftests/mm/guard-regions.c | 18 ++++--------------
> 1 file changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
> index 48e8b1539be3..117639891953 100644
> --- a/tools/testing/selftests/mm/guard-regions.c
> +++ b/tools/testing/selftests/mm/guard-regions.c
> @@ -2203,17 +2203,6 @@ TEST_F(guard_regions, collapse)
> if (variant->backing != ANON_BACKED)
> ASSERT_EQ(ftruncate(self->fd, size), 0);
>
> - /*
> - * We must close and re-open local-file backed as read-only for
> - * CONFIG_READ_ONLY_THP_FOR_FS to work.
> - */
> - if (variant->backing == LOCAL_FILE_BACKED) {
> - ASSERT_EQ(close(self->fd), 0);
> -
> - self->fd = open(self->path, O_RDONLY);
> - ASSERT_GE(self->fd, 0);
> - }
What if someone runs this with a filesystem that does not support large folios?
Would we want an allowlist for known-good fs'es, similar to how we handle
gup_longterm.c, and SKIP otherwise, because we know that MADV_COLLAPSE would
always fail for a different reason?
In any case, the test would not misbehave if passed an unsupported FS, so LGTM
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support
2026-04-24 2:49 [PATCH 7.2 v4 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (11 preceding siblings ...)
2026-04-24 2:49 ` [PATCH 7.2 v4 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
@ 2026-04-24 10:30 ` Andrew Morton
12 siblings, 0 replies; 32+ messages in thread
From: Andrew Morton @ 2026-04-24 10:30 UTC (permalink / raw)
To: Zi Yan
Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba,
Alexander Viro, Christian Brauner, Jan Kara, David Hildenbrand,
Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
linux-kselftest
On Thu, 23 Apr 2026 22:49:03 -0400 Zi Yan <ziy@nvidia.com> wrote:
> This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
> read-only THPs for FSes with large folio support (the supported orders
> need to include PMD_ORDER) by default. It is on top of mm-new.
Thanks, I'll add this for testing.
4 of 12 aren't yet reviewed, but that's what we have akpm nagmails for ;)
^ permalink raw reply [flat|nested] 32+ messages in thread