* Re: [PATCH v5 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() [not found] ` <20260429152924.727124-4-ziy@nvidia.com> @ 2026-05-07 4:29 ` Lance Yang 2026-05-08 19:43 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 23+ messages in thread From: Lance Yang @ 2026-05-07 4:29 UTC (permalink / raw) To: ziy Cc: akpm, david, willy, songliubraving, clm, dsterba, viro, brauner, jack, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang, vbabka, rppt, surenb, mhocko, shuah, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On Wed, Apr 29, 2026 at 11:29:13AM -0400, Zi Yan wrote: >Replace it with a check on the max folio order of the file's address space >mapping, making sure PMD folio is supported. Keep the inode open-for-write >check, since even if collapse_file() now makes sure all to-be-collapsed >folios are clean and the created PMD file THP can be handled by FSes >properly, the filemap_flush() could perform undesirable write back. > >Signed-off-by: Zi Yan <ziy@nvidia.com> >--- LGTM. Reviewed-by: Lance Yang <lance.yang@linux.dev> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() [not found] ` <20260429152924.727124-4-ziy@nvidia.com> 2026-05-07 4:29 ` [PATCH v5 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Lance Yang @ 2026-05-08 19:43 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 19:43 UTC (permalink / raw) To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4/29/26 17:29, Zi Yan wrote: > Replace it with a check on the max folio order of the file's address space > mapping, making sure PMD folio is supported. Keep the inode open-for-write > check, since even if collapse_file() now makes sure all to-be-collapsed > folios are clean and the created PMD file THP can be handled by FSes > properly, the filemap_flush() could perform undesirable write back. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- Acked-by: David Hildenbrand (Arm) <david@kernel.org> -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20260429152924.727124-2-ziy@nvidia.com>]
* Re: [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check [not found] ` <20260429152924.727124-2-ziy@nvidia.com> @ 2026-05-07 6:08 ` Zi Yan 2026-05-07 6:57 ` Zi Yan 2026-05-08 19:39 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 23+ messages in thread From: Zi Yan @ 2026-05-07 6:08 UTC (permalink / raw) To: Andrew Morton, Nico Pache, Lance Yang Cc: Matthew Wilcox (Oracle), David Hildenbrand, Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 29 Apr 2026, at 23:29, Zi Yan wrote: > collapse_file() requires FSes supporting large folio with at least > PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. > MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem. > > While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE. > > Add a helper function mapping_pmd_folio_support() for FSes supporting large > folio with at least PMD_ORDER. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > Reviewed-by: Lance Yang <lance.yang@linux.dev> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > include/linux/pagemap.h | 26 ++++++++++++++++++++++++++ > mm/khugepaged.c | 10 ++++++++-- > 2 files changed, 34 insertions(+), 2 deletions(-) > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 1f50991b43e3b..1fed3414fe9b8 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -513,6 +513,32 @@ static inline bool mapping_large_folio_support(const struct address_space *mappi > return mapping_max_folio_order(mapping) > 0; > } > > +/** > + * mapping_pmd_folio_support() - Check if a mapping support PMD-sized folio > + * @mapping: The address_space > + * > + * Some file supports large folio but does not support as large as PMD order. > + * If a PMD-sized pagecache folio is attempted to be created on a filesystem, > + * this check needs to be performed first. > + * > + * Return: true - PMD-sized folio is supported, false - PMD-sized folio is not > + * supported. > + */ > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > +static inline bool mapping_pmd_folio_support(const struct address_space *mapping) > +{ > + /* AS_FOLIO_ORDER is only reasonable for pagecache folios */ > + VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON); > + > + return mapping_max_folio_order(mapping) >= PMD_ORDER; Hi Andrew, Can you help apply the fixup below? It addresses the concern raised by Nico and Lance[1]. Thanks. [1] https://lore.kernel.org/all/aa778cfc-b7f8-4100-89bb-d2b2ef8e1138@redhat.com/ From 9c1749df6516c06b1628ac7804bc7a6ab5709f37 Mon Sep 17 00:00:00 2001 From: Zi Yan <ziy@nvidia.com> Date: Thu, 7 May 2026 02:02:28 -0400 Subject: [PATCH] fix mapping_pmd_folio_support() to represent its exact meaning Signed-off-by: Zi Yan <ziy@nvidia.com> --- include/linux/pagemap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c6a4ecd3d6ed1..52895896f3357 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -530,7 +530,8 @@ static inline bool mapping_pmd_folio_support(const struct address_space *mapping /* AS_FOLIO_ORDER is only reasonable for pagecache folios */ VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON); - return mapping_max_folio_order(mapping) >= PMD_ORDER; + return mapping_min_folio_order(mapping) <= PMD_ORDER && + mapping_max_folio_order(mapping) >= PMD_ORDER } #else static inline bool mapping_pmd_folio_support(const struct address_space *mapping) -- 2.53.0 Best Regards, Yan, Zi ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check 2026-05-07 6:08 ` [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan @ 2026-05-07 6:57 ` Zi Yan 0 siblings, 0 replies; 23+ messages in thread From: Zi Yan @ 2026-05-07 6:57 UTC (permalink / raw) To: Andrew Morton, Nico Pache, Lance Yang Cc: Matthew Wilcox (Oracle), David Hildenbrand, Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 7 May 2026, at 14:08, Zi Yan wrote: > On 29 Apr 2026, at 23:29, Zi Yan wrote: > >> collapse_file() requires FSes supporting large folio with at least >> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. >> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem. >> >> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE. >> >> Add a helper function mapping_pmd_folio_support() for FSes supporting large >> folio with at least PMD_ORDER. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> Reviewed-by: Lance Yang <lance.yang@linux.dev> >> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> include/linux/pagemap.h | 26 ++++++++++++++++++++++++++ >> mm/khugepaged.c | 10 ++++++++-- >> 2 files changed, 34 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >> index 1f50991b43e3b..1fed3414fe9b8 100644 >> --- a/include/linux/pagemap.h >> +++ b/include/linux/pagemap.h >> @@ -513,6 +513,32 @@ static inline bool mapping_large_folio_support(const struct address_space *mappi >> return mapping_max_folio_order(mapping) > 0; >> } >> >> +/** >> + * mapping_pmd_folio_support() - Check if a mapping support PMD-sized folio >> + * @mapping: The address_space >> + * >> + * Some file supports large folio but does not support as large as PMD order. >> + * If a PMD-sized pagecache folio is attempted to be created on a filesystem, >> + * this check needs to be performed first. >> + * >> + * Return: true - PMD-sized folio is supported, false - PMD-sized folio is not >> + * supported. >> + */ >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> +static inline bool mapping_pmd_folio_support(const struct address_space *mapping) >> +{ >> + /* AS_FOLIO_ORDER is only reasonable for pagecache folios */ >> + VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON); >> + >> + return mapping_max_folio_order(mapping) >= PMD_ORDER; > > Hi Andrew, > > Can you help apply the fixup below? It addresses the concern raised by > Nico and Lance[1]. Thanks. > > [1] https://lore.kernel.org/all/aa778cfc-b7f8-4100-89bb-d2b2ef8e1138@redhat.com/ > I was missing a semicolon. Here is the right fixup. Sorry for the noise. From fbd183f7528a3d0bdb421018af4aef45f6366682 Mon Sep 17 00:00:00 2001 From: Zi Yan <ziy@nvidia.com> Date: Thu, 7 May 2026 02:02:28 -0400 Subject: [PATCH] fix mapping_pmd_folio_support() to represent its exact meaning. Signed-off-by: Zi Yan <ziy@nvidia.com> --- include/linux/pagemap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c6a4ecd3d6ed1..41dbb55a47d8e 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -530,7 +530,8 @@ static inline bool mapping_pmd_folio_support(const struct address_space *mapping /* AS_FOLIO_ORDER is only reasonable for pagecache folios */ VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON); - return mapping_max_folio_order(mapping) >= PMD_ORDER; + return mapping_min_folio_order(mapping) <= PMD_ORDER && + mapping_max_folio_order(mapping) >= PMD_ORDER; } #else static inline bool mapping_pmd_folio_support(const struct address_space *mapping) -- 2.53.0 Best Regards, Yan, Zi ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check [not found] ` <20260429152924.727124-2-ziy@nvidia.com> 2026-05-07 6:08 ` [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan @ 2026-05-08 19:39 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 19:39 UTC (permalink / raw) To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4/29/26 17:29, Zi Yan wrote: > collapse_file() requires FSes supporting large folio with at least > PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. > MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem. > > While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE. > > Add a helper function mapping_pmd_folio_support() for FSes supporting large > folio with at least PMD_ORDER. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > Reviewed-by: Lance Yang <lance.yang@linux.dev> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > include/linux/pagemap.h | 26 ++++++++++++++++++++++++++ > mm/khugepaged.c | 10 ++++++++-- > 2 files changed, 34 insertions(+), 2 deletions(-) > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 1f50991b43e3b..1fed3414fe9b8 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -513,6 +513,32 @@ static inline bool mapping_large_folio_support(const struct address_space *mappi > return mapping_max_folio_order(mapping) > 0; > } Some smaller doc improvements: > > +/** > + * mapping_pmd_folio_support() - Check if a mapping support PMD-sized folio "supports" > + * @mapping: The address_space > + * > + * Some file supports large folio but does not support as large as PMD order. "While some mappings support large folios, they might not support PMD-sized folios." > + * If a PMD-sized pagecache folio is attempted to be created on a filesystem, > + * this check needs to be performed first. I'd rather just add (relate to the previous sentence): "This function checks whether a mapping supports PMD-sized folios. For example, khugepaged needs this information before attempting to collapsing THPs". > + * > + * Return: true - PMD-sized folio is supported, false - PMD-sized folio is not > + * supported. I'd say "folios are supported" and end up with: "True if PMD-sized folios are supported, otherwise false." > + */ > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > +static inline bool mapping_pmd_folio_support(const struct address_space *mapping) > +{ > + /* AS_FOLIO_ORDER is only reasonable for pagecache folios */ > + VM_WARN_ON_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON); > + > + return mapping_max_folio_order(mapping) >= PMD_ORDER; Agreed that this should check for ==, although in practice it would currently not make a difference. > +} > +#else > +static inline bool mapping_pmd_folio_support(const struct address_space *mapping) > +{ > + return false; > +} > +#endif > + > /* Return the maximum folio size for this pagecache mapping, in bytes. */ > static inline size_t mapping_max_folio_size(const struct address_space *mapping) > { > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index e112525c4aa9c..6808f2b48d864 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2235,8 +2235,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, > int nr_none = 0; > bool is_shmem = shmem_file(file); > > - VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); > - VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); > + /* > + * MADV_COLLAPSE ignores shmem huge config, so do not check shmem > + * > + * TODO: once shmem always calls mapping_set_large_folios() on its > + * mapping, the shmem check can be removed. > + */ TODO LGTM. Acked-by: David Hildenbrand (Arm) <david@kernel.org> -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20260429153538.727855-1-ziy@nvidia.com>]
* Re: [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option [not found] ` <20260429153538.727855-1-ziy@nvidia.com> @ 2026-05-07 12:48 ` Lance Yang 2026-05-08 2:52 ` Wei Yang 1 sibling, 0 replies; 23+ messages in thread From: Lance Yang @ 2026-05-07 12:48 UTC (permalink / raw) To: ziy Cc: akpm, david, willy, songliubraving, clm, dsterba, viro, brauner, jack, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang, vbabka, rppt, surenb, mhocko, shuah, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On Wed, Apr 29, 2026 at 11:35:28AM -0400, Zi Yan wrote: >After removing READ_ONLY_THP_FOR_FS check in file_thp_enabled(), >khugepaged and MADV_COLLAPSE can run on FSes with PMD THP pagecache >support even without READ_ONLY_THP_FOR_FS enabled. Remove the Kconfig first >so that no one can use READ_ONLY_THP_FOR_FS as upcoming commits remove >mapping->nr_thps, which its safe guard mechanism relies on. > >Signed-off-by: Zi Yan <ziy@nvidia.com> >Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> >Acked-by: David Hildenbrand (Arm) <david@kernel.org> >Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> >--- READ_ONLY_THP_FOR_FS lost its job now that mapping_pmd_folio_support() does the gating ;) LGTM. Reviewed-by: Lance Yang <lance.yang@linux.dev> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option [not found] ` <20260429153538.727855-1-ziy@nvidia.com> 2026-05-07 12:48 ` [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Lance Yang @ 2026-05-08 2:52 ` Wei Yang 2026-05-08 3:22 ` Lance Yang 1 sibling, 1 reply; 23+ messages in thread From: Wei Yang @ 2026-05-08 2:52 UTC (permalink / raw) To: Zi Yan Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On Wed, Apr 29, 2026 at 11:35:28AM -0400, Zi Yan wrote: >After removing READ_ONLY_THP_FOR_FS check in file_thp_enabled(), >khugepaged and MADV_COLLAPSE can run on FSes with PMD THP pagecache >support even without READ_ONLY_THP_FOR_FS enabled. Remove the Kconfig first >so that no one can use READ_ONLY_THP_FOR_FS as upcoming commits remove >mapping->nr_thps, which its safe guard mechanism relies on. > >Signed-off-by: Zi Yan <ziy@nvidia.com> >Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> >Acked-by: David Hildenbrand (Arm) <david@kernel.org> >Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> >--- > mm/Kconfig | 11 ----------- > 1 file changed, 11 deletions(-) > >diff --git a/mm/Kconfig b/mm/Kconfig >index e221fa1dc54d0..27dc5b0139ba6 100644 >--- a/mm/Kconfig >+++ b/mm/Kconfig >@@ -936,17 +936,6 @@ config THP_SWAP > > For selection by architectures with reasonable THP sizes. > >-config READ_ONLY_THP_FOR_FS >- bool "Read-only THP for filesystems (EXPERIMENTAL)" >- depends on TRANSPARENT_HUGEPAGE >- >- help >- Allow khugepaged to put read-only file-backed pages in THP. >- >- This is marked experimental because it is a new feature. Write >- support of file THPs will be developed in the next few release >- cycles. >- Hi, I see hugepage_enabled() in khugepaged.c still use READ_ONLY_THP_FOR_FS. > config NO_PAGE_MAPCOUNT > bool "No per-page mapcount (EXPERIMENTAL)" > help >-- >2.53.0 > -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option 2026-05-08 2:52 ` Wei Yang @ 2026-05-08 3:22 ` Lance Yang 0 siblings, 0 replies; 23+ messages in thread From: Lance Yang @ 2026-05-08 3:22 UTC (permalink / raw) To: richard.weiyang, ziy, akpm Cc: david, willy, songliubraving, clm, dsterba, viro, brauner, jack, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang, vbabka, rppt, surenb, mhocko, shuah, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On Fri, May 08, 2026 at 02:52:39AM +0000, Wei Yang wrote: >On Wed, Apr 29, 2026 at 11:35:28AM -0400, Zi Yan wrote: >>After removing READ_ONLY_THP_FOR_FS check in file_thp_enabled(), >>khugepaged and MADV_COLLAPSE can run on FSes with PMD THP pagecache >>support even without READ_ONLY_THP_FOR_FS enabled. Remove the Kconfig first >>so that no one can use READ_ONLY_THP_FOR_FS as upcoming commits remove >>mapping->nr_thps, which its safe guard mechanism relies on. >> >>Signed-off-by: Zi Yan <ziy@nvidia.com> >>Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> >>Acked-by: David Hildenbrand (Arm) <david@kernel.org> >>Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> >>--- >> mm/Kconfig | 11 ----------- >> 1 file changed, 11 deletions(-) >> >>diff --git a/mm/Kconfig b/mm/Kconfig >>index e221fa1dc54d0..27dc5b0139ba6 100644 >>--- a/mm/Kconfig >>+++ b/mm/Kconfig >>@@ -936,17 +936,6 @@ config THP_SWAP >> >> For selection by architectures with reasonable THP sizes. >> >>-config READ_ONLY_THP_FOR_FS >>- bool "Read-only THP for filesystems (EXPERIMENTAL)" >>- depends on TRANSPARENT_HUGEPAGE >>- >>- help >>- Allow khugepaged to put read-only file-backed pages in THP. >>- >>- This is marked experimental because it is a new feature. Write >>- support of file THPs will be developed in the next few release >>- cycles. >>- > >Hi, > >I see hugepage_enabled() in khugepaged.c still use READ_ONLY_THP_FOR_FS. > Yes, I noticed that[1] as well. Maybe it was dropped by accident :) [1] https://lore.kernel.org/linux-mm/20260507044938.12529-1-lance.yang@linux.dev/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20260429153538.727855-5-ziy@nvidia.com>]
* Re: [PATCH v5 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() [not found] ` <20260429153538.727855-5-ziy@nvidia.com> @ 2026-05-08 7:01 ` Lance Yang 2026-05-08 19:46 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 23+ messages in thread From: Lance Yang @ 2026-05-08 7:01 UTC (permalink / raw) To: ziy Cc: akpm, david, willy, songliubraving, clm, dsterba, viro, brauner, jack, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang, vbabka, rppt, surenb, mhocko, shuah, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On Wed, Apr 29, 2026 at 11:35:32AM -0400, Zi Yan wrote: >After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or >not. folio_split() can be used on a FS with large folio support without >worrying about getting a THP on a FS without large folio support. > >When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can >appear in a FS without large folio support after khugepaged or >madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(), >such a PMD large pagecache folio is split and if the FS does not support >large folio, it needs to be split to order-0 ones and could not be split >non uniformly to ones with various orders. try_folio_split_to_order() was >added to handle this situation by checking folio_check_splittable(..., >SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to >READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now >READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created >with FSes supporting large folio, this function is no longer needed and all >large pagecache folios can be split non uniformly. > >Signed-off-by: Zi Yan <ziy@nvidia.com> >--- With that gone, large page-cache folios should only exsit on mappings that support large folios, so folio_split() with mapping_min_folio_order() is enough :) LGTM, feel free to add: Reviewed-by: Lance Yang <lance.yang@linux.dev> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() [not found] ` <20260429153538.727855-5-ziy@nvidia.com> 2026-05-08 7:01 ` [PATCH v5 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() Lance Yang @ 2026-05-08 19:46 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 19:46 UTC (permalink / raw) To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4/29/26 17:35, Zi Yan wrote: > After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or > not. folio_split() can be used on a FS with large folio support without > worrying about getting a THP on a FS without large folio support. > > When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can > appear in a FS without large folio support after khugepaged or > madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(), > such a PMD large pagecache folio is split and if the FS does not support > large folio, it needs to be split to order-0 ones and could not be split > non uniformly to ones with various orders. try_folio_split_to_order() was > added to handle this situation by checking folio_check_splittable(..., > SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to > READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now > READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created > with FSes supporting large folio, this function is no longer needed and all > large pagecache folios can be split non uniformly. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- Acked-by: David Hildenbrand (Arm) <david@kernel.org> -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20260429153538.727855-9-ziy@nvidia.com>]
* Re: [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files [not found] ` <20260429153538.727855-9-ziy@nvidia.com> @ 2026-05-08 7:46 ` Lance Yang [not found] ` <CFECCB44-EEEA-4D3F-A505-3BA2C564C107@nvidia.com> 2026-05-08 20:13 ` David Hildenbrand (Arm) 2 siblings, 0 replies; 23+ messages in thread From: Lance Yang @ 2026-05-08 7:46 UTC (permalink / raw) To: ziy Cc: akpm, david, willy, songliubraving, clm, dsterba, viro, brauner, jack, ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang, vbabka, rppt, surenb, mhocko, shuah, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On Wed, Apr 29, 2026 at 11:35:36AM -0400, Zi Yan wrote: >collapse_file() is capable of collapsing pagecache folios from writable >files to PMD folios. Now enable clean pagecache folio collapse in addition >to read-only pagecache folio collapse by removing the >inode_is_open_for_write() from file_thp_enabled() and only performing >filemap_flush() if the file is read-only. > >This means userspace needs to explicitly flush the content of pagecache >folios before khugepaged can collapse the folios, or use >madvise(MADV_COLLAPSE), which does the flush in the retry. The reason is >that blindly enabling dirty pagecache folio from writable files collapse >makes khugepaged flush these folios all the time. It is undesirable to >cause system level pagecache flushes. > >To properly support dirty pagecache folio collapse, filemap_flush() needs >to be avoided. Potentially, merging associated buffer instead of dropping >it with filemap_release_folio() might be needed. > >NOTE: this breaks khugepaged selftests for writable file pagecache >collapse, which is set to fail all the time. The next commit fix it. > >Signed-off-by: Zi Yan <ziy@nvidia.com> >--- > mm/huge_memory.c | 2 +- > mm/khugepaged.c | 9 ++++++++- > 2 files changed, 9 insertions(+), 2 deletions(-) > >diff --git a/mm/huge_memory.c b/mm/huge_memory.c >index 9b3abb98a7e51..e1e9d59db6e70 100644 >--- a/mm/huge_memory.c >+++ b/mm/huge_memory.c >@@ -97,7 +97,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > if (!mapping_pmd_folio_support(vma->vm_file->f_mapping)) > return false; > >- return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); >+ return S_ISREG(inode->i_mode); > } > > /* If returns true, we are unable to access the VMA's folios. */ >diff --git a/mm/khugepaged.c b/mm/khugepaged.c >index 1ee15b48962a3..fb7ff643973cc 100644 >--- a/mm/khugepaged.c >+++ b/mm/khugepaged.c >@@ -2345,7 +2345,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, > * forcing writeback in loop. > */ Nit: the comment above now looks stale. It still says: "There won’t be new dirty pages." That was true when file_thp_enabled() rejected writable-open files, but not after this patch ;) Otherwise, LGTM. Reviewed-by: Lance Yang <lance.yang@linux.dev> > xas_unlock_irq(&xas); >- filemap_flush(mapping); >+ /* >+ * Only flush for read-only files. Writable >+ * files can have their folios dirty at any >+ * time; blindly flushing them would cause >+ * undesirable system-wide writeback. >+ */ >+ if (!inode_is_open_for_write(mapping->host)) >+ filemap_flush(mapping); > result = SCAN_PAGE_DIRTY_OR_WRITEBACK; > goto xa_unlocked; > } else if (folio_test_writeback(folio)) { >-- >2.53.0 > > ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <CFECCB44-EEEA-4D3F-A505-3BA2C564C107@nvidia.com>]
* Re: [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files [not found] ` <CFECCB44-EEEA-4D3F-A505-3BA2C564C107@nvidia.com> @ 2026-05-08 20:09 ` David Hildenbrand (Arm) 0 siblings, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 20:09 UTC (permalink / raw) To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4/30/26 17:18, Zi Yan wrote: > On 29 Apr 2026, at 11:35, Zi Yan wrote: > >> collapse_file() is capable of collapsing pagecache folios from writable >> files to PMD folios. Now enable clean pagecache folio collapse in addition >> to read-only pagecache folio collapse by removing the >> inode_is_open_for_write() from file_thp_enabled() and only performing >> filemap_flush() if the file is read-only. >> >> This means userspace needs to explicitly flush the content of pagecache >> folios before khugepaged can collapse the folios, or use >> madvise(MADV_COLLAPSE), which does the flush in the retry. The reason is >> that blindly enabling dirty pagecache folio from writable files collapse >> makes khugepaged flush these folios all the time. It is undesirable to >> cause system level pagecache flushes. >> >> To properly support dirty pagecache folio collapse, filemap_flush() needs >> to be avoided. Potentially, merging associated buffer instead of dropping >> it with filemap_release_folio() might be needed. >> >> NOTE: this breaks khugepaged selftests for writable file pagecache >> collapse, which is set to fail all the time. The next commit fix it. > > Sashiko: > > Is it acceptable to intentionally break the selftests in this commit? Each > commit should be self-contained and not knowingly introduce test regressions, > as this breaks bisectability. > > Answer: > > I am fine with squashing patch 14 into this one, but it is unlikely anyone > gets a kernel at exact this commit. For bisecting maybe problematic. I wouldn't squash it, though, instead prepare the selftests in a different way beforehand, such that they won't get broken? -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files [not found] ` <20260429153538.727855-9-ziy@nvidia.com> 2026-05-08 7:46 ` [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files Lance Yang [not found] ` <CFECCB44-EEEA-4D3F-A505-3BA2C564C107@nvidia.com> @ 2026-05-08 20:13 ` David Hildenbrand (Arm) 2 siblings, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 20:13 UTC (permalink / raw) To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4/29/26 17:35, Zi Yan wrote: > collapse_file() is capable of collapsing pagecache folios from writable > files to PMD folios. Now enable clean pagecache folio collapse in addition > to read-only pagecache folio collapse by removing the > inode_is_open_for_write() from file_thp_enabled() and only performing > filemap_flush() if the file is read-only. > > This means userspace needs to explicitly flush the content of pagecache > folios before khugepaged can collapse the folios, or use > madvise(MADV_COLLAPSE), which does the flush in the retry. The reason is > that blindly enabling dirty pagecache folio from writable files collapse > makes khugepaged flush these folios all the time. It is undesirable to > cause system level pagecache flushes. > > To properly support dirty pagecache folio collapse, filemap_flush() needs > to be avoided. Potentially, merging associated buffer instead of dropping > it with filemap_release_folio() might be needed. > > NOTE: this breaks khugepaged selftests for writable file pagecache > collapse, which is set to fail all the time. The next commit fix it. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > mm/huge_memory.c | 2 +- > mm/khugepaged.c | 9 ++++++++- > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9b3abb98a7e51..e1e9d59db6e70 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -97,7 +97,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > if (!mapping_pmd_folio_support(vma->vm_file->f_mapping)) > return false; > > - return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > + return S_ISREG(inode->i_mode); > } > > /* If returns true, we are unable to access the VMA's folios. */ > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 1ee15b48962a3..fb7ff643973cc 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2345,7 +2345,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, > * forcing writeback in loop. > */ > xas_unlock_irq(&xas); > - filemap_flush(mapping); > + /* > + * Only flush for read-only files. Writable > + * files can have their folios dirty at any > + * time; blindly flushing them would cause > + * undesirable system-wide writeback. > + */ That comment should really be merged in the comment above. Also, there we say "khugepaged only works on read-only fd" ... which is now just wrong? Please revise that whole comment as you incorporate your comment. Apart from that I guess this is fine ... or we'll learn rather quickly, haha. -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20260429153538.727855-7-ziy@nvidia.com>]
[parent not found: <52285e2c-af42-4c0d-9926-017f80b6614c@redhat.com>]
* Re: [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged [not found] ` <52285e2c-af42-4c0d-9926-017f80b6614c@redhat.com> @ 2026-05-06 13:11 ` Zi Yan 2026-05-08 19:51 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 23+ messages in thread From: Zi Yan @ 2026-05-06 13:11 UTC (permalink / raw) To: Nico Pache Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4 May 2026, at 12:23, Nico Pache wrote: > On 4/29/26 9:35 AM, Zi Yan wrote: >> Change the requirement to a file system with large folio support and the >> supported order needs to include PMD_ORDER. >> >> Also add tests of opening a file with read write permission and populating >> folios with writes. Reuse the XFS image from split_huge_page_test. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> tools/testing/selftests/mm/khugepaged.c | 131 +++++++++++++++------- >> tools/testing/selftests/mm/run_vmtests.sh | 12 +- >> 2 files changed, 102 insertions(+), 41 deletions(-) >> >> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c >> index a6bb9d50363d2..80b913185c643 100644 >> --- a/tools/testing/selftests/mm/khugepaged.c >> +++ b/tools/testing/selftests/mm/khugepaged.c >> @@ -49,7 +49,8 @@ struct mem_ops { >> const char *name; >> }; >> -static struct mem_ops *file_ops; >> +static struct mem_ops *read_only_file_ops; >> +static struct mem_ops *read_write_file_ops; >> static struct mem_ops *anon_ops; >> static struct mem_ops *shmem_ops; >> @@ -112,7 +113,8 @@ static void restore_settings(int sig) >> static void save_settings(void) >> { >> printf("Save THP and khugepaged settings..."); >> - if (file_ops && finfo.type == VMA_FILE) >> + if ((read_only_file_ops || read_write_file_ops) && >> + finfo.type == VMA_FILE) >> thp_set_read_ahead_path(finfo.dev_queue_read_ahead_path); >> thp_save_settings(); >> @@ -364,11 +366,14 @@ static bool anon_check_huge(void *addr, int nr_hpages) >> return check_huge_anon(addr, nr_hpages, hpage_pmd_size); >> } >> -static void *file_setup_area(int nr_hpages) >> +static void *file_setup_area_common(int nr_hpages, bool read_only) >> { >> int fd; >> void *p; >> unsigned long size; >> + int open_opt = read_only ? O_RDONLY : O_RDWR; >> + int mmap_prot = read_only ? PROT_READ : (PROT_READ | PROT_WRITE); >> + int mmap_opt = read_only ? MAP_PRIVATE : MAP_SHARED; >> unlink(finfo.path); /* Cleanup from previous failed tests */ >> printf("Creating %s for collapse%s...", finfo.path, >> @@ -399,14 +404,15 @@ static void *file_setup_area(int nr_hpages) >> munmap(p, size); >> success("OK"); >> - printf("Opening %s read only for collapse...", finfo.path); >> - finfo.fd = open(finfo.path, O_RDONLY, 777); >> + printf("Opening %s %s for collapse...", finfo.path, >> + read_only ? "read only" : "read-write"); >> + finfo.fd = open(finfo.path, open_opt, 777); >> if (finfo.fd < 0) { >> perror("open()"); >> exit(EXIT_FAILURE); >> } >> - p = mmap(BASE_ADDR, size, PROT_READ, >> - MAP_PRIVATE, finfo.fd, 0); >> + p = mmap(BASE_ADDR, size, mmap_prot, >> + mmap_opt, finfo.fd, 0); >> if (p == MAP_FAILED || p != BASE_ADDR) { >> perror("mmap()"); >> exit(EXIT_FAILURE); >> @@ -418,6 +424,16 @@ static void *file_setup_area(int nr_hpages) >> return p; >> } >> +static void *file_setup_read_only_area(int nr_hpages) >> +{ >> + return file_setup_area_common(nr_hpages, /* read_only= */ true); >> +} >> + >> +static void *file_setup_read_write_area(int nr_hpages) >> +{ >> + return file_setup_area_common(nr_hpages, /* read_only= */ false); >> +} >> + >> static void file_cleanup_area(void *p, unsigned long size) >> { >> munmap(p, size); >> @@ -425,14 +441,25 @@ static void file_cleanup_area(void *p, unsigned long size) >> unlink(finfo.path); >> } >> -static void file_fault(void *p, unsigned long start, unsigned long end) >> +static void file_fault_common(void *p, unsigned long start, unsigned long end, >> + int madv_ops) >> { >> - if (madvise(((char *)p) + start, end - start, MADV_POPULATE_READ)) { >> + if (madvise(((char *)p) + start, end - start, madv_ops)) { >> perror("madvise(MADV_POPULATE_READ"); >> exit(EXIT_FAILURE); >> } >> } >> +static void file_fault_read(void *p, unsigned long start, unsigned long end) >> +{ >> + file_fault_common(p, start, end, MADV_POPULATE_READ); >> +} >> + >> +static void file_fault_write(void *p, unsigned long start, unsigned long end) >> +{ >> + file_fault_common(p, start, end, MADV_POPULATE_WRITE); >> +} >> + >> static bool file_check_huge(void *addr, int nr_hpages) >> { >> switch (finfo.type) { >> @@ -488,10 +515,18 @@ static struct mem_ops __anon_ops = { >> .name = "anon", >> }; >> -static struct mem_ops __file_ops = { >> - .setup_area = &file_setup_area, >> +static struct mem_ops __read_only_file_ops = { >> + .setup_area = &file_setup_read_only_area, >> .cleanup_area = &file_cleanup_area, >> - .fault = &file_fault, >> + .fault = &file_fault_read, >> + .check_huge = &file_check_huge, >> + .name = "file", >> +}; >> + >> +static struct mem_ops __read_write_file_ops = { >> + .setup_area = &file_setup_read_write_area, >> + .cleanup_area = &file_cleanup_area, >> + .fault = &file_fault_write, >> .check_huge = &file_check_huge, >> .name = "file", >> }; >> @@ -504,6 +539,18 @@ static struct mem_ops __shmem_ops = { >> .name = "shmem", >> }; >> +static bool is_tmpfs(struct mem_ops *ops) >> +{ >> + return (ops == &__read_only_file_ops || >> + ops == &__read_write_file_ops) && >> + finfo.type == VMA_SHMEM; >> +} >> + >> +static bool is_anon(struct mem_ops *ops) >> +{ >> + return ops == &__anon_ops; >> +} >> + >> static void __madvise_collapse(const char *msg, char *p, int nr_hpages, >> struct mem_ops *ops, bool expect) >> { >> @@ -512,6 +559,10 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages, >> printf("%s...", msg); >> + /* read&write file collapse always fail */ > > Just to confirm, you are adding the write part here so that before commit 13 & 14, the behavior is that it will fail. Whereas after with patch 13/14, we expect this behavior to be supported correct? Yes. > > Thanks for working on this :) > > I plan on testing the selftests changes at some point this week (if I find some downtime during LSFMM), and finishing my review here. > Thanks. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged 2026-05-06 13:11 ` [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan @ 2026-05-08 19:51 ` David Hildenbrand (Arm) 0 siblings, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 19:51 UTC (permalink / raw) To: Zi Yan, Nico Pache Cc: Andrew Morton, Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 5/6/26 15:11, Zi Yan wrote: > On 4 May 2026, at 12:23, Nico Pache wrote: > >> On 4/29/26 9:35 AM, Zi Yan wrote: >>> Change the requirement to a file system with large folio support and the >>> supported order needs to include PMD_ORDER. >>> >>> Also add tests of opening a file with read write permission and populating >>> folios with writes. Reuse the XFS image from split_huge_page_test. >>> >>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>> --- >>> tools/testing/selftests/mm/khugepaged.c | 131 +++++++++++++++------- >>> tools/testing/selftests/mm/run_vmtests.sh | 12 +- >>> 2 files changed, 102 insertions(+), 41 deletions(-) >>> >>> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c >>> index a6bb9d50363d2..80b913185c643 100644 >>> --- a/tools/testing/selftests/mm/khugepaged.c >>> +++ b/tools/testing/selftests/mm/khugepaged.c >>> @@ -49,7 +49,8 @@ struct mem_ops { >>> const char *name; >>> }; >>> -static struct mem_ops *file_ops; >>> +static struct mem_ops *read_only_file_ops; >>> +static struct mem_ops *read_write_file_ops; >>> static struct mem_ops *anon_ops; >>> static struct mem_ops *shmem_ops; >>> @@ -112,7 +113,8 @@ static void restore_settings(int sig) >>> static void save_settings(void) >>> { >>> printf("Save THP and khugepaged settings..."); >>> - if (file_ops && finfo.type == VMA_FILE) >>> + if ((read_only_file_ops || read_write_file_ops) && >>> + finfo.type == VMA_FILE) >>> thp_set_read_ahead_path(finfo.dev_queue_read_ahead_path); >>> thp_save_settings(); >>> @@ -364,11 +366,14 @@ static bool anon_check_huge(void *addr, int nr_hpages) >>> return check_huge_anon(addr, nr_hpages, hpage_pmd_size); >>> } >>> -static void *file_setup_area(int nr_hpages) >>> +static void *file_setup_area_common(int nr_hpages, bool read_only) >>> { >>> int fd; >>> void *p; >>> unsigned long size; >>> + int open_opt = read_only ? O_RDONLY : O_RDWR; >>> + int mmap_prot = read_only ? PROT_READ : (PROT_READ | PROT_WRITE); >>> + int mmap_opt = read_only ? MAP_PRIVATE : MAP_SHARED; >>> unlink(finfo.path); /* Cleanup from previous failed tests */ >>> printf("Creating %s for collapse%s...", finfo.path, >>> @@ -399,14 +404,15 @@ static void *file_setup_area(int nr_hpages) >>> munmap(p, size); >>> success("OK"); >>> - printf("Opening %s read only for collapse...", finfo.path); >>> - finfo.fd = open(finfo.path, O_RDONLY, 777); >>> + printf("Opening %s %s for collapse...", finfo.path, >>> + read_only ? "read only" : "read-write"); >>> + finfo.fd = open(finfo.path, open_opt, 777); >>> if (finfo.fd < 0) { >>> perror("open()"); >>> exit(EXIT_FAILURE); >>> } >>> - p = mmap(BASE_ADDR, size, PROT_READ, >>> - MAP_PRIVATE, finfo.fd, 0); >>> + p = mmap(BASE_ADDR, size, mmap_prot, >>> + mmap_opt, finfo.fd, 0); >>> if (p == MAP_FAILED || p != BASE_ADDR) { >>> perror("mmap()"); >>> exit(EXIT_FAILURE); >>> @@ -418,6 +424,16 @@ static void *file_setup_area(int nr_hpages) >>> return p; >>> } >>> +static void *file_setup_read_only_area(int nr_hpages) >>> +{ >>> + return file_setup_area_common(nr_hpages, /* read_only= */ true); >>> +} >>> + >>> +static void *file_setup_read_write_area(int nr_hpages) >>> +{ >>> + return file_setup_area_common(nr_hpages, /* read_only= */ false); >>> +} >>> + >>> static void file_cleanup_area(void *p, unsigned long size) >>> { >>> munmap(p, size); >>> @@ -425,14 +441,25 @@ static void file_cleanup_area(void *p, unsigned long size) >>> unlink(finfo.path); >>> } >>> -static void file_fault(void *p, unsigned long start, unsigned long end) >>> +static void file_fault_common(void *p, unsigned long start, unsigned long end, >>> + int madv_ops) >>> { >>> - if (madvise(((char *)p) + start, end - start, MADV_POPULATE_READ)) { >>> + if (madvise(((char *)p) + start, end - start, madv_ops)) { >>> perror("madvise(MADV_POPULATE_READ"); >>> exit(EXIT_FAILURE); >>> } >>> } >>> +static void file_fault_read(void *p, unsigned long start, unsigned long end) >>> +{ >>> + file_fault_common(p, start, end, MADV_POPULATE_READ); >>> +} >>> + >>> +static void file_fault_write(void *p, unsigned long start, unsigned long end) >>> +{ >>> + file_fault_common(p, start, end, MADV_POPULATE_WRITE); >>> +} >>> + >>> static bool file_check_huge(void *addr, int nr_hpages) >>> { >>> switch (finfo.type) { >>> @@ -488,10 +515,18 @@ static struct mem_ops __anon_ops = { >>> .name = "anon", >>> }; >>> -static struct mem_ops __file_ops = { >>> - .setup_area = &file_setup_area, >>> +static struct mem_ops __read_only_file_ops = { >>> + .setup_area = &file_setup_read_only_area, >>> .cleanup_area = &file_cleanup_area, >>> - .fault = &file_fault, >>> + .fault = &file_fault_read, >>> + .check_huge = &file_check_huge, >>> + .name = "file", >>> +}; >>> + >>> +static struct mem_ops __read_write_file_ops = { >>> + .setup_area = &file_setup_read_write_area, >>> + .cleanup_area = &file_cleanup_area, >>> + .fault = &file_fault_write, >>> .check_huge = &file_check_huge, >>> .name = "file", >>> }; >>> @@ -504,6 +539,18 @@ static struct mem_ops __shmem_ops = { >>> .name = "shmem", >>> }; >>> +static bool is_tmpfs(struct mem_ops *ops) >>> +{ >>> + return (ops == &__read_only_file_ops || >>> + ops == &__read_write_file_ops) && >>> + finfo.type == VMA_SHMEM; >>> +} >>> + >>> +static bool is_anon(struct mem_ops *ops) >>> +{ >>> + return ops == &__anon_ops; >>> +} >>> + >>> static void __madvise_collapse(const char *msg, char *p, int nr_hpages, >>> struct mem_ops *ops, bool expect) >>> { >>> @@ -512,6 +559,10 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages, >>> printf("%s...", msg); >>> + /* read&write file collapse always fail */ >> >> Just to confirm, you are adding the write part here so that before commit 13 & 14, the behavior is that it will fail. Whereas after with patch 13/14, we expect this behavior to be supported correct? > Yes. Confusing, we usually add new test after adding new functionality, not the other way around? :) -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <7e42faea-9f55-4722-a426-94be7fc3a49b@redhat.com>]
* Re: [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged [not found] ` <7e42faea-9f55-4722-a426-94be7fc3a49b@redhat.com> @ 2026-05-06 13:15 ` Zi Yan 2026-05-07 6:35 ` Nico Pache 0 siblings, 1 reply; 23+ messages in thread From: Zi Yan @ 2026-05-06 13:15 UTC (permalink / raw) To: Nico Pache Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4 May 2026, at 18:11, Nico Pache wrote: > On 4/29/26 9:35 AM, Zi Yan wrote: >> Change the requirement to a file system with large folio support and the >> supported order needs to include PMD_ORDER. >> >> Also add tests of opening a file with read write permission and populating >> folios with writes. Reuse the XFS image from split_huge_page_test. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> tools/testing/selftests/mm/khugepaged.c | 131 +++++++++++++++------- >> tools/testing/selftests/mm/run_vmtests.sh | 12 +- >> 2 files changed, 102 insertions(+), 41 deletions(-) >> <snip> >> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh >> index 3b61677fe9840..854c5c3e3a6ae 100755 >> --- a/tools/testing/selftests/mm/run_vmtests.sh >> +++ b/tools/testing/selftests/mm/run_vmtests.sh >> @@ -490,8 +490,6 @@ CATEGORY="thp" run_test ./khugepaged all:shmem >> CATEGORY="thp" run_test ./khugepaged -s 4 all:shmem >> -CATEGORY="thp" run_test ./transhuge-stress -d 20 >> - >> # Try to create XFS if not provided >> if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then >> if [ "${HAVE_HUGEPAGES}" = "1" ]; then >> @@ -508,6 +506,14 @@ if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then >> fi >> fi >> +if [ -n "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then >> +CATEGORY="thp" run_test ./khugepaged all:file ${SPLIT_HUGE_PAGE_TEST_XFS_PATH} >> +else >> + count_total=$(( count_total + 1 )) >> + count_skip=$(( count_skip + 1 )) >> + echo "[SKIP] ./khugepaged all:file" | tap_prefix > > This leads selftest runs to always litter the output with SKIP when running this with the wrapper > > make -C tools/testing/selftests TARGETS=mm run_tests Yes, this is intended to let people know one case is not tested and skipped if XFS cannot be created. > >> +fi >> + >> CATEGORY="thp" run_test ./split_huge_page_test ${SPLIT_HUGE_PAGE_TEST_XFS_PATH} >> if [ -n "${MOUNTED_XFS}" ]; then >> @@ -516,6 +522,8 @@ if [ -n "${MOUNTED_XFS}" ]; then >> rm -f ${XFS_IMG} >> fi >> +CATEGORY="thp" run_test ./transhuge-stress -d 20 >> + >> CATEGORY="thp" run_test ./folio_split_race_test >> CATEGORY="migration" run_test ./migration Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged 2026-05-06 13:15 ` Zi Yan @ 2026-05-07 6:35 ` Nico Pache 2026-05-07 7:21 ` Zi Yan 0 siblings, 1 reply; 23+ messages in thread From: Nico Pache @ 2026-05-07 6:35 UTC (permalink / raw) To: Zi Yan Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On Wed, May 6, 2026 at 7:15 AM Zi Yan <ziy@nvidia.com> wrote: > > On 4 May 2026, at 18:11, Nico Pache wrote: > > > On 4/29/26 9:35 AM, Zi Yan wrote: > >> Change the requirement to a file system with large folio support and the > >> supported order needs to include PMD_ORDER. > >> > >> Also add tests of opening a file with read write permission and populating > >> folios with writes. Reuse the XFS image from split_huge_page_test. > >> > >> Signed-off-by: Zi Yan <ziy@nvidia.com> > >> --- > >> tools/testing/selftests/mm/khugepaged.c | 131 +++++++++++++++------- > >> tools/testing/selftests/mm/run_vmtests.sh | 12 +- > >> 2 files changed, 102 insertions(+), 41 deletions(-) > >> > > <snip> > > >> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh > >> index 3b61677fe9840..854c5c3e3a6ae 100755 > >> --- a/tools/testing/selftests/mm/run_vmtests.sh > >> +++ b/tools/testing/selftests/mm/run_vmtests.sh > >> @@ -490,8 +490,6 @@ CATEGORY="thp" run_test ./khugepaged all:shmem > >> CATEGORY="thp" run_test ./khugepaged -s 4 all:shmem > >> -CATEGORY="thp" run_test ./transhuge-stress -d 20 > >> - > >> # Try to create XFS if not provided > >> if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then > >> if [ "${HAVE_HUGEPAGES}" = "1" ]; then > >> @@ -508,6 +506,14 @@ if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then > >> fi > >> fi > >> +if [ -n "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then > >> +CATEGORY="thp" run_test ./khugepaged all:file ${SPLIT_HUGE_PAGE_TEST_XFS_PATH} > >> +else > >> + count_total=$(( count_total + 1 )) > >> + count_skip=$(( count_skip + 1 )) > >> + echo "[SKIP] ./khugepaged all:file" | tap_prefix > > > > This leads selftest runs to always litter the output with SKIP when running this with the wrapper > > > > make -C tools/testing/selftests TARGETS=mm run_tests > > Yes, this is intended to let people know one case is not tested and skipped if XFS cannot be created. Yes but it prints after each test case run, not just the khugepaged runs > > > > >> +fi > >> + > >> CATEGORY="thp" run_test ./split_huge_page_test ${SPLIT_HUGE_PAGE_TEST_XFS_PATH} > >> if [ -n "${MOUNTED_XFS}" ]; then > >> @@ -516,6 +522,8 @@ if [ -n "${MOUNTED_XFS}" ]; then > >> rm -f ${XFS_IMG} > >> fi > >> +CATEGORY="thp" run_test ./transhuge-stress -d 20 > >> + > >> CATEGORY="thp" run_test ./folio_split_race_test > >> CATEGORY="migration" run_test ./migration > > > Best Regards, > Yan, Zi > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged 2026-05-07 6:35 ` Nico Pache @ 2026-05-07 7:21 ` Zi Yan 0 siblings, 0 replies; 23+ messages in thread From: Zi Yan @ 2026-05-07 7:21 UTC (permalink / raw) To: Nico Pache Cc: Andrew Morton, David Hildenbrand, Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 7 May 2026, at 14:35, Nico Pache wrote: > On Wed, May 6, 2026 at 7:15 AM Zi Yan <ziy@nvidia.com> wrote: >> >> On 4 May 2026, at 18:11, Nico Pache wrote: >> >>> On 4/29/26 9:35 AM, Zi Yan wrote: >>>> Change the requirement to a file system with large folio support and the >>>> supported order needs to include PMD_ORDER. >>>> >>>> Also add tests of opening a file with read write permission and populating >>>> folios with writes. Reuse the XFS image from split_huge_page_test. >>>> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>>> --- >>>> tools/testing/selftests/mm/khugepaged.c | 131 +++++++++++++++------- >>>> tools/testing/selftests/mm/run_vmtests.sh | 12 +- >>>> 2 files changed, 102 insertions(+), 41 deletions(-) >>>> >> >> <snip> >> >>>> diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh >>>> index 3b61677fe9840..854c5c3e3a6ae 100755 >>>> --- a/tools/testing/selftests/mm/run_vmtests.sh >>>> +++ b/tools/testing/selftests/mm/run_vmtests.sh >>>> @@ -490,8 +490,6 @@ CATEGORY="thp" run_test ./khugepaged all:shmem >>>> CATEGORY="thp" run_test ./khugepaged -s 4 all:shmem >>>> -CATEGORY="thp" run_test ./transhuge-stress -d 20 >>>> - >>>> # Try to create XFS if not provided >>>> if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then >>>> if [ "${HAVE_HUGEPAGES}" = "1" ]; then >>>> @@ -508,6 +506,14 @@ if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then >>>> fi >>>> fi >>>> +if [ -n "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then >>>> +CATEGORY="thp" run_test ./khugepaged all:file ${SPLIT_HUGE_PAGE_TEST_XFS_PATH} >>>> +else >>>> + count_total=$(( count_total + 1 )) >>>> + count_skip=$(( count_skip + 1 )) >>>> + echo "[SKIP] ./khugepaged all:file" | tap_prefix >>> >>> This leads selftest runs to always litter the output with SKIP when running this with the wrapper >>> >>> make -C tools/testing/selftests TARGETS=mm run_tests >> >> Yes, this is intended to let people know one case is not tested and skipped if XFS cannot be created. > > Yes but it prints after each test case run, not just the khugepaged runs You are right. Let me send a fixup. Thank you. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged [not found] ` <20260429153538.727855-7-ziy@nvidia.com> [not found] ` <52285e2c-af42-4c0d-9926-017f80b6614c@redhat.com> [not found] ` <7e42faea-9f55-4722-a426-94be7fc3a49b@redhat.com> @ 2026-05-07 7:24 ` Zi Yan [not found] ` <B1D68BA2-11D2-4053-B715-F7704ED784DA@nvidia.com> 2026-05-08 20:06 ` David Hildenbrand (Arm) 4 siblings, 0 replies; 23+ messages in thread From: Zi Yan @ 2026-05-07 7:24 UTC (permalink / raw) To: Andrew Morton, Nico Pache Cc: David Hildenbrand, Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 29 Apr 2026, at 23:35, Zi Yan wrote: > Change the requirement to a file system with large folio support and the > supported order needs to include PMD_ORDER. > > Also add tests of opening a file with read write permission and populating > folios with writes. Reuse the XFS image from split_huge_page_test. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > tools/testing/selftests/mm/khugepaged.c | 131 +++++++++++++++------- > tools/testing/selftests/mm/run_vmtests.sh | 12 +- > 2 files changed, 102 insertions(+), 41 deletions(-) > Hi Andrew, Here is the second fixup to this patch. It addresses an issue that [SKIP] is always printed, if XFS is not present, even if khugepaged never runs, discovered by Nico. Thanks. From eb9a5c25434e3882423f621dc46281156eac843a Mon Sep 17 00:00:00 2001 From: Zi Yan <ziy@nvidia.com> Date: Thu, 7 May 2026 03:17:51 -0400 Subject: [PATCH] fix run_vmtests.sh to only print SKIP when khugepaged is selected Signed-off-by: Zi Yan <ziy@nvidia.com> --- tools/testing/selftests/mm/run_vmtests.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 854c5c3e3a6ae..b73921b2cac02 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -508,7 +508,7 @@ fi if [ -n "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then CATEGORY="thp" run_test ./khugepaged all:file ${SPLIT_HUGE_PAGE_TEST_XFS_PATH} -else +elif test_selected thp; then count_total=$(( count_total + 1 )) count_skip=$(( count_skip + 1 )) echo "[SKIP] ./khugepaged all:file" | tap_prefix -- 2.53.0 Best Regards, Yan, Zi ^ permalink raw reply related [flat|nested] 23+ messages in thread
[parent not found: <B1D68BA2-11D2-4053-B715-F7704ED784DA@nvidia.com>]
[parent not found: <3BFC4C26-1C97-40AA-B4B7-7472B9768565@nvidia.com>]
* Re: [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged [not found] ` <3BFC4C26-1C97-40AA-B4B7-7472B9768565@nvidia.com> @ 2026-05-08 19:48 ` David Hildenbrand (Arm) 0 siblings, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 19:48 UTC (permalink / raw) To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4/30/26 17:27, Zi Yan wrote: > On 30 Apr 2026, at 11:16, Zi Yan wrote: > >> On 29 Apr 2026, at 11:35, Zi Yan wrote: >> >>> Change the requirement to a file system with large folio support and the >>> supported order needs to include PMD_ORDER. >>> >>> Also add tests of opening a file with read write permission and populating >>> folios with writes. Reuse the XFS image from split_huge_page_test. >>> >>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>> --- >>> tools/testing/selftests/mm/khugepaged.c | 131 +++++++++++++++------- >>> tools/testing/selftests/mm/run_vmtests.sh | 12 +- >>> 2 files changed, 102 insertions(+), 41 deletions(-) >>> > > <snip> > >>> -static void file_fault(void *p, unsigned long start, unsigned long end) >>> +static void file_fault_common(void *p, unsigned long start, unsigned long end, >>> + int madv_ops) >>> { >>> - if (madvise(((char *)p) + start, end - start, MADV_POPULATE_READ)) { >>> + if (madvise(((char *)p) + start, end - start, madv_ops)) { >>> perror("madvise(MADV_POPULATE_READ"); >> >> Sashiko: >> Since madv_ops can now be either MADV_POPULATE_READ or MADV_POPULATE_WRITE, >> will this hardcoded error message be misleading if the write fault path >> fails? >> >> Answer: >> Will send a fixup. > > > This is the fixup: > From 76e301cf5198f33d07492e224ec627b94902b4b6 Mon Sep 17 00:00:00 2001 > From: Zi Yan <ziy@nvidia.com> > Date: Thu, 30 Apr 2026 11:22:30 -0400 > Subject: [PATCH] selftests/mm: khugepaged perror fixup. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > tools/testing/selftests/mm/khugepaged.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c > index 80b913185c643..97b8fcc490c76 100644 > --- a/tools/testing/selftests/mm/khugepaged.c > +++ b/tools/testing/selftests/mm/khugepaged.c > @@ -445,7 +445,10 @@ static void file_fault_common(void *p, unsigned long start, unsigned long end, > int madv_ops) > { > if (madvise(((char *)p) + start, end - start, madv_ops)) { > - perror("madvise(MADV_POPULATE_READ"); > + if (madv_ops == MADV_POPULATE_READ) > + perror("madvise(MADV_POPULATE_READ"); > + else if (madv_ops == MADV_POPULATE_WRITE) > + perror("madvise(MADV_POPULATE_WRITE"); Alternatively, just "madvise()". It's unexpected to fail in any case and would have to be debugged ... -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged [not found] ` <20260429153538.727855-7-ziy@nvidia.com> ` (3 preceding siblings ...) [not found] ` <B1D68BA2-11D2-4053-B715-F7704ED784DA@nvidia.com> @ 2026-05-08 20:06 ` David Hildenbrand (Arm) 4 siblings, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 20:06 UTC (permalink / raw) To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle), Song Liu Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 4/29/26 17:35, Zi Yan wrote: > Change the requirement to a file system with large folio support and the > supported order needs to include PMD_ORDER. > > Also add tests of opening a file with read write permission and populating > folios with writes. Reuse the XFS image from split_huge_page_test. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > tools/testing/selftests/mm/khugepaged.c | 131 +++++++++++++++------- > tools/testing/selftests/mm/run_vmtests.sh | 12 +- > 2 files changed, 102 insertions(+), 41 deletions(-) > > diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c > index a6bb9d50363d2..80b913185c643 100644 > --- a/tools/testing/selftests/mm/khugepaged.c > +++ b/tools/testing/selftests/mm/khugepaged.c > @@ -49,7 +49,8 @@ struct mem_ops { > const char *name; > }; > > -static struct mem_ops *file_ops; > +static struct mem_ops *read_only_file_ops; > +static struct mem_ops *read_write_file_ops; > static struct mem_ops *anon_ops; > static struct mem_ops *shmem_ops; > > @@ -112,7 +113,8 @@ static void restore_settings(int sig) > static void save_settings(void) > { > printf("Save THP and khugepaged settings..."); > - if (file_ops && finfo.type == VMA_FILE) > + if ((read_only_file_ops || read_write_file_ops) && > + finfo.type == VMA_FILE) > thp_set_read_ahead_path(finfo.dev_queue_read_ahead_path); > thp_save_settings(); > > @@ -364,11 +366,14 @@ static bool anon_check_huge(void *addr, int nr_hpages) > return check_huge_anon(addr, nr_hpages, hpage_pmd_size); > } > > -static void *file_setup_area(int nr_hpages) > +static void *file_setup_area_common(int nr_hpages, bool read_only) > { > int fd; > void *p; > unsigned long size; > + int open_opt = read_only ? O_RDONLY : O_RDWR; > + int mmap_prot = read_only ? PROT_READ : (PROT_READ | PROT_WRITE); > + int mmap_opt = read_only ? MAP_PRIVATE : MAP_SHARED; Can all be const and read better at the very top of the list. Why do we go from private->shared depending on "read_only" parameter? I'd assume we'd want to test MAP_SHARED for both (read-only + read-write)? > > unlink(finfo.path); /* Cleanup from previous failed tests */ > printf("Creating %s for collapse%s...", finfo.path, > @@ -399,14 +404,15 @@ static void *file_setup_area(int nr_hpages) > munmap(p, size); > success("OK"); > > - printf("Opening %s read only for collapse...", finfo.path); > - finfo.fd = open(finfo.path, O_RDONLY, 777); > + printf("Opening %s %s for collapse...", finfo.path, > + read_only ? "read only" : "read-write"); "read-only" ? > + finfo.fd = open(finfo.path, open_opt, 777); > if (finfo.fd < 0) { > perror("open()"); > exit(EXIT_FAILURE); > } > - p = mmap(BASE_ADDR, size, PROT_READ, > - MAP_PRIVATE, finfo.fd, 0); > + p = mmap(BASE_ADDR, size, mmap_prot, > + mmap_opt, finfo.fd, 0); While at it, can fit that into a single line. > if (p == MAP_FAILED || p != BASE_ADDR) { > perror("mmap()"); > exit(EXIT_FAILURE); > @@ -418,6 +424,16 @@ static void *file_setup_area(int nr_hpages) > return p; > } > > +static void *file_setup_read_only_area(int nr_hpages) > +{ > + return file_setup_area_common(nr_hpages, /* read_only= */ true); > +} > + > +static void *file_setup_read_write_area(int nr_hpages) > +{ > + return file_setup_area_common(nr_hpages, /* read_only= */ false); > +} > + > static void file_cleanup_area(void *p, unsigned long size) > { > munmap(p, size); > @@ -425,14 +441,25 @@ static void file_cleanup_area(void *p, unsigned long size) > unlink(finfo.path); > } > > -static void file_fault(void *p, unsigned long start, unsigned long end) > +static void file_fault_common(void *p, unsigned long start, unsigned long end, > + int madv_ops) > { > - if (madvise(((char *)p) + start, end - start, MADV_POPULATE_READ)) { > + if (madvise(((char *)p) + start, end - start, madv_ops)) { > perror("madvise(MADV_POPULATE_READ"); > exit(EXIT_FAILURE); > } > } > > +static void file_fault_read(void *p, unsigned long start, unsigned long end) > +{ > + file_fault_common(p, start, end, MADV_POPULATE_READ); Do we really want file_fault_common()? I'd say, just inline it. In particular avoids checking the madv_ops to figure out the error message ... > +} > + > +static void file_fault_write(void *p, unsigned long start, unsigned long end) > +{ > + file_fault_common(p, start, end, MADV_POPULATE_WRITE); > +} > + > static bool file_check_huge(void *addr, int nr_hpages) > { > switch (finfo.type) { > @@ -488,10 +515,18 @@ static struct mem_ops __anon_ops = { > .name = "anon", > }; [...] > }; > > +static bool is_tmpfs(struct mem_ops *ops) > +{ > + return (ops == &__read_only_file_ops || > + ops == &__read_write_file_ops) && Can't this fit into two lines? > + finfo.type == VMA_SHMEM; > +} > + > +static bool is_anon(struct mem_ops *ops) > +{ > + return ops == &__anon_ops; > +} > + > static void __madvise_collapse(const char *msg, char *p, int nr_hpages, > struct mem_ops *ops, bool expect) > { > - > static void alloc_at_fault(void) > { > struct thp_settings settings = *thp_current_settings(); > @@ -1097,8 +1142,8 @@ static void usage(void) > fprintf(stderr, "\t<context>\t: [all|khugepaged|madvise]\n"); > fprintf(stderr, "\t<mem_type>\t: [all|anon|file|shmem]\n"); > fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n"); > - fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n"); > - fprintf(stderr, "\tCONFIG_READ_ONLY_THP_FOR_FS=y\n"); > + fprintf(stderr, "\n\t\"file,all\" mem_type requires a file system\n"); > + fprintf(stderr, "\twith large folio support (order >= PMD order)\n"); "with PMD-sized large folio support"? > fprintf(stderr, "\n\tif [dir] is a (sub)directory of a tmpfs mount, tmpfs must be\n"); > fprintf(stderr, "\tmounted with huge=advise option for khugepaged tests to work\n"); > fprintf(stderr, "\n\tSupported Options:\n"); > @@ -1154,20 +1199,22 @@ static void parse_test_type(int argc, char **argv) > usage(); -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20260429091305.fd5a1c8c986c111527c2b024@linux-foundation.org>]
* Re: [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files [not found] ` <20260429091305.fd5a1c8c986c111527c2b024@linux-foundation.org> @ 2026-05-09 22:10 ` Zi Yan 2026-05-11 7:19 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 23+ messages in thread From: Zi Yan @ 2026-05-09 22:10 UTC (permalink / raw) To: Andrew Morton, David Hildenbrand, Lance Yang Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 30 Apr 2026, at 0:13, Andrew Morton wrote: > On Wed, 29 Apr 2026 11:29:10 -0400 Zi Yan <ziy@nvidia.com> wrote: > >> This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating >> file-backed THPs for FSes with large folio support (the supported orders >> need to include PMD_ORDER) by default, including for writable files. > > Thanks, I queued this up. Hi Andrew, I am planning to address David’s and Lance’s feedback and send a new series after next week (after 05/17) as I will have no access to computers this upcoming week. Hi Lance and David, Thank you for the feedback. Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files 2026-05-09 22:10 ` [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Zi Yan @ 2026-05-11 7:19 ` David Hildenbrand (Arm) 0 siblings, 0 replies; 23+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-11 7:19 UTC (permalink / raw) To: Zi Yan, Andrew Morton, Lance Yang Cc: Matthew Wilcox (Oracle), Song Liu, Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Jan Kara, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel, linux-fsdevel, linux-mm, linux-kselftest On 5/10/26 00:10, Zi Yan wrote: > On 30 Apr 2026, at 0:13, Andrew Morton wrote: > >> On Wed, 29 Apr 2026 11:29:10 -0400 Zi Yan <ziy@nvidia.com> wrote: >> >>> This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating >>> file-backed THPs for FSes with large folio support (the supported orders >>> need to include PMD_ORDER) by default, including for writable files. >> >> Thanks, I queued this up. > > Hi Andrew, > > I am planning to address David’s and Lance’s feedback and send a new series > after next week (after 05/17) as I will have no access to computers this > upcoming week. Cool, then no chance to get distracted! :) Enjoy the time off! -- Cheers, David ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2026-05-11 7:19 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260429152924.727124-1-ziy@nvidia.com>
[not found] ` <20260429152924.727124-4-ziy@nvidia.com>
2026-05-07 4:29 ` [PATCH v5 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Lance Yang
2026-05-08 19:43 ` David Hildenbrand (Arm)
[not found] ` <20260429152924.727124-2-ziy@nvidia.com>
2026-05-07 6:08 ` [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-05-07 6:57 ` Zi Yan
2026-05-08 19:39 ` David Hildenbrand (Arm)
[not found] ` <20260429153538.727855-1-ziy@nvidia.com>
2026-05-07 12:48 ` [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Lance Yang
2026-05-08 2:52 ` Wei Yang
2026-05-08 3:22 ` Lance Yang
[not found] ` <20260429153538.727855-5-ziy@nvidia.com>
2026-05-08 7:01 ` [PATCH v5 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() Lance Yang
2026-05-08 19:46 ` David Hildenbrand (Arm)
[not found] ` <20260429153538.727855-9-ziy@nvidia.com>
2026-05-08 7:46 ` [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files Lance Yang
[not found] ` <CFECCB44-EEEA-4D3F-A505-3BA2C564C107@nvidia.com>
2026-05-08 20:09 ` David Hildenbrand (Arm)
2026-05-08 20:13 ` David Hildenbrand (Arm)
[not found] ` <20260429153538.727855-7-ziy@nvidia.com>
[not found] ` <52285e2c-af42-4c0d-9926-017f80b6614c@redhat.com>
2026-05-06 13:11 ` [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-05-08 19:51 ` David Hildenbrand (Arm)
[not found] ` <7e42faea-9f55-4722-a426-94be7fc3a49b@redhat.com>
2026-05-06 13:15 ` Zi Yan
2026-05-07 6:35 ` Nico Pache
2026-05-07 7:21 ` Zi Yan
2026-05-07 7:24 ` Zi Yan
[not found] ` <B1D68BA2-11D2-4053-B715-F7704ED784DA@nvidia.com>
[not found] ` <3BFC4C26-1C97-40AA-B4B7-7472B9768565@nvidia.com>
2026-05-08 19:48 ` David Hildenbrand (Arm)
2026-05-08 20:06 ` David Hildenbrand (Arm)
[not found] ` <20260429091305.fd5a1c8c986c111527c2b024@linux-foundation.org>
2026-05-09 22:10 ` [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Zi Yan
2026-05-11 7:19 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox