public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP
@ 2026-03-10  3:11 WANG Rui
  2026-03-10  3:11 ` [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs WANG Rui
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: WANG Rui @ 2026-03-10  3:11 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, David Hildenbrand, Dev Jain, Jan Kara,
	Kees Cook, Lance Yang, Liam R. Howlett, Lorenzo Stoakes,
	Matthew Wilcox, Nico Pache, Ryan Roberts, Zi Yan
  Cc: linux-fsdevel, linux-mm, linux-kernel, WANG Rui

Changes since [v3]:
* Fixed compilation failure under !CONFIG_TRANSPARENT_HUGEPAGE.
* No functional changes otherwise.

Changes since [v2]:
* Renamed align_to_pmd() to should_align_to_pmd().
* Added benchmark results to the commit message.

Changes since [v1]:
* Dropped the Kconfig option CONFIG_ELF_RO_LOAD_THP_ALIGNMENT.
* Moved the alignment logic into a helper align_to_pmd() for clarity.
* Improved the comment explaining why we skip the optimization
  when PMD_SIZE > 32MB.

When Transparent Huge Pages (THP) are enabled in "always" mode,
file-backed read-only mappings can be backed by PMD-sized huge pages
if they meet the alignment and size requirements.

For ELF executables loaded by the kernel ELF binary loader, PT_LOAD
segments are normally aligned according to p_align, which is often
only page-sized. As a result, large read-only segments that are
otherwise eligible may fail to be mapped using PMD-sized THP.

A segment is considered eligible if:

* THP is in "always" mode,
* it is not writable,
* both p_vaddr and p_offset are PMD-aligned,
* its file size is at least PMD_SIZE, and
* its existing p_align is smaller than PMD_SIZE.

To avoid excessive address space padding on systems with very large
PMD_SIZE values, this optimization is applied only when PMD_SIZE <= 32MB,
since requiring larger alignments would be unreasonable, especially on
32-bit systems with a much more limited virtual address space.

This increases the likelihood that large text segments of ELF
executables are backed by PMD-sized THP, reducing TLB pressure and
improving performance for large binaries.

This only affects ELF executables loaded directly by the kernel
binary loader. Shared libraries loaded by user space (e.g. via the
dynamic linker) are not affected.

Benchmark

Machine: AMD Ryzen 9 7950X (x86_64)
Binutils: 2.46
GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)

Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.

                Without patch        With patch
instructions    8,246,133,611,932    8,246,025,137,750
cpu-cycles      8,001,028,142,928    7,565,925,107,502
itlb-misses     3,672,158,331        26,821,242
time elapsed    64.66 s              61.97 s

Instructions are basically unchanged. iTLB misses drop from ~3.67B to
~26M (~99.27% reduction), which results in about a ~5.44% reduction in
cycles and ~4.18% shorter wall time for this workload.

[v3]: https://lore.kernel.org/linux-fsdevel/20260310013958.103636-1-r@hev.cc
[v2]: https://lore.kernel.org/linux-fsdevel/20260304114727.384416-1-r@hev.cc
[v1]: https://lore.kernel.org/linux-fsdevel/20260302155046.286650-1-r@hev.cc

WANG Rui (2):
  huge_mm: add stubs for THP-disabled configs
  binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for
    THP

 fs/binfmt_elf.c         | 29 +++++++++++++++++++++++++++++
 include/linux/huge_mm.h | 10 ++++++++++
 2 files changed, 39 insertions(+)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs
  2026-03-10  3:11 [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
@ 2026-03-10  3:11 ` WANG Rui
  2026-03-12 15:53   ` David Hildenbrand (Arm)
  2026-03-10  3:11 ` [PATCH v4 2/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
  2026-03-13  8:41 ` [PATCH v4 0/2] " Baolin Wang
  2 siblings, 1 reply; 12+ messages in thread
From: WANG Rui @ 2026-03-10  3:11 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, David Hildenbrand, Dev Jain, Jan Kara,
	Kees Cook, Lance Yang, Liam R. Howlett, Lorenzo Stoakes,
	Matthew Wilcox, Nico Pache, Ryan Roberts, Zi Yan
  Cc: linux-fsdevel, linux-mm, linux-kernel, WANG Rui

hugepage_global_enabled() and hugepage_global_always() only exist
when CONFIG_TRANSPARENT_HUGEPAGE is set.  Add inline stubs that
return false to let code compile when THP is disabled.

Signed-off-by: WANG Rui <r@hev.cc>
---
 include/linux/huge_mm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a4d9f964dfde..badeebd4ea98 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -570,6 +570,16 @@ void map_anon_folio_pmd_nopf(struct folio *folio, pmd_t *pmd,
 
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+static inline bool hugepage_global_enabled(void)
+{
+	return false;
+}
+
+static inline bool hugepage_global_always(void)
+{
+	return false;
+}
+
 static inline bool folio_test_pmd_mappable(struct folio *folio)
 {
 	return false;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP
  2026-03-10  3:11 [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
  2026-03-10  3:11 ` [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs WANG Rui
@ 2026-03-10  3:11 ` WANG Rui
  2026-03-13  8:41 ` [PATCH v4 0/2] " Baolin Wang
  2 siblings, 0 replies; 12+ messages in thread
From: WANG Rui @ 2026-03-10  3:11 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, David Hildenbrand, Dev Jain, Jan Kara,
	Kees Cook, Lance Yang, Liam R. Howlett, Lorenzo Stoakes,
	Matthew Wilcox, Nico Pache, Ryan Roberts, Zi Yan
  Cc: linux-fsdevel, linux-mm, linux-kernel, WANG Rui

When Transparent Huge Pages (THP) are enabled in "always" mode,
file-backed read-only mappings can be backed by PMD-sized huge pages
if they meet the alignment and size requirements.

For ELF executables loaded by the kernel ELF binary loader, PT_LOAD
segments are normally aligned according to p_align, which is often
only page-sized. As a result, large read-only segments that are
otherwise eligible may fail to be mapped using PMD-sized THP.

A segment is considered eligible if:

* THP is in "always" mode,
* it is not writable,
* both p_vaddr and p_offset are PMD-aligned,
* its file size is at least PMD_SIZE, and
* its existing p_align is smaller than PMD_SIZE.

To avoid excessive address space padding on systems with very large
PMD_SIZE values, this optimization is applied only when PMD_SIZE <= 32MB,
since requiring larger alignments would be unreasonable, especially on
32-bit systems with a much more limited virtual address space.

This increases the likelihood that large text segments of ELF
executables are backed by PMD-sized THP, reducing TLB pressure and
improving performance for large binaries.

This only affects ELF executables loaded directly by the kernel
binary loader. Shared libraries loaded by user space (e.g. via the
dynamic linker) are not affected.

Benchmark

Machine: AMD Ryzen 9 7950X (x86_64)
Binutils: 2.46
GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)

Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.

                Without patch        With patch
instructions    8,246,133,611,932    8,246,025,137,750
cpu-cycles      8,001,028,142,928    7,565,925,107,502
itlb-misses     3,672,158,331        26,821,242
time elapsed    64.66 s              61.97 s

Instructions are basically unchanged. iTLB misses drop from ~3.67B to
~26M (~99.27% reduction), which results in about a ~5.44% reduction in
cycles and ~4.18% shorter wall time for this workload.

Signed-off-by: WANG Rui <r@hev.cc>
---
 fs/binfmt_elf.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index fb857faaf0d6..a0d679c31ede 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -28,6 +28,7 @@
 #include <linux/highuid.h>
 #include <linux/compiler.h>
 #include <linux/highmem.h>
+#include <linux/huge_mm.h>
 #include <linux/hugetlb.h>
 #include <linux/pagemap.h>
 #include <linux/vmalloc.h>
@@ -489,6 +490,30 @@ static int elf_read(struct file *file, void *buf, size_t len, loff_t pos)
 	return 0;
 }
 
+static inline bool should_align_to_pmd(const struct elf_phdr *cmd)
+{
+	/*
+	 * Avoid excessive virtual address space padding when PMD_SIZE is very
+	 * large (e.g. some 64K base-page configurations).
+	 */
+	if (PMD_SIZE > SZ_32M)
+		return false;
+
+	if (!hugepage_global_always())
+		return false;
+
+	if (!IS_ALIGNED(cmd->p_vaddr | cmd->p_offset, PMD_SIZE))
+		return false;
+
+	if (cmd->p_filesz < PMD_SIZE)
+		return false;
+
+	if (cmd->p_flags & PF_W)
+		return false;
+
+	return true;
+}
+
 static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
 {
 	unsigned long alignment = 0;
@@ -501,6 +526,10 @@ static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
 			/* skip non-power of two alignments as invalid */
 			if (!is_power_of_2(p_align))
 				continue;
+
+			if (should_align_to_pmd(&cmds[i]) && p_align < PMD_SIZE)
+				p_align = PMD_SIZE;
+
 			alignment = max(alignment, p_align);
 		}
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs
  2026-03-10  3:11 ` [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs WANG Rui
@ 2026-03-12 15:53   ` David Hildenbrand (Arm)
  2026-03-12 15:57     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12 15:53 UTC (permalink / raw)
  To: WANG Rui, Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, Dev Jain, Jan Kara, Kees Cook, Lance Yang,
	Liam R. Howlett, Lorenzo Stoakes, Matthew Wilcox, Nico Pache,
	Ryan Roberts, Zi Yan
  Cc: linux-fsdevel, linux-mm, linux-kernel

On 3/10/26 04:11, WANG Rui wrote:
> hugepage_global_enabled() and hugepage_global_always() only exist
> when CONFIG_TRANSPARENT_HUGEPAGE is set.  Add inline stubs that
> return false to let code compile when THP is disabled.
> 
> Signed-off-by: WANG Rui <r@hev.cc>
> ---
>  include/linux/huge_mm.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index a4d9f964dfde..badeebd4ea98 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -570,6 +570,16 @@ void map_anon_folio_pmd_nopf(struct folio *folio, pmd_t *pmd,
>  
>  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> +static inline bool hugepage_global_enabled(void)
> +{
> +	return false;
> +}
> +
> +static inline bool hugepage_global_always(void)
> +{
> +	return false;
> +}
> +
>  static inline bool folio_test_pmd_mappable(struct folio *folio)
>  {
>  	return false;

There are other ways to enable PMD THP. So I don't quite think this is
the right tool for the job.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs
  2026-03-12 15:53   ` David Hildenbrand (Arm)
@ 2026-03-12 15:57     ` David Hildenbrand (Arm)
  2026-03-12 16:12       ` hev
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12 15:57 UTC (permalink / raw)
  To: WANG Rui, Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, Dev Jain, Jan Kara, Kees Cook, Lance Yang,
	Liam R. Howlett, Lorenzo Stoakes, Matthew Wilcox, Nico Pache,
	Ryan Roberts, Zi Yan
  Cc: linux-fsdevel, linux-mm, linux-kernel

On 3/12/26 16:53, David Hildenbrand (Arm) wrote:
> On 3/10/26 04:11, WANG Rui wrote:
>> hugepage_global_enabled() and hugepage_global_always() only exist
>> when CONFIG_TRANSPARENT_HUGEPAGE is set.  Add inline stubs that
>> return false to let code compile when THP is disabled.
>>
>> Signed-off-by: WANG Rui <r@hev.cc>
>> ---
>>  include/linux/huge_mm.h | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index a4d9f964dfde..badeebd4ea98 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -570,6 +570,16 @@ void map_anon_folio_pmd_nopf(struct folio *folio, pmd_t *pmd,
>>  
>>  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
>>  
>> +static inline bool hugepage_global_enabled(void)
>> +{
>> +	return false;
>> +}
>> +
>> +static inline bool hugepage_global_always(void)
>> +{
>> +	return false;
>> +}
>> +
>>  static inline bool folio_test_pmd_mappable(struct folio *folio)
>>  {
>>  	return false;
> 
> There are other ways to enable PMD THP. So I don't quite think this is
> the right tool for the job.

Ah, you care about file THPs ... gah.

Why can't we simply do the alignment without considering the current
setting?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs
  2026-03-12 15:57     ` David Hildenbrand (Arm)
@ 2026-03-12 16:12       ` hev
  2026-03-12 16:29         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 12+ messages in thread
From: hev @ 2026-03-12 16:12 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, Dev Jain, Jan Kara, Kees Cook, Lance Yang,
	Liam R. Howlett, Lorenzo Stoakes, Matthew Wilcox, Nico Pache,
	Ryan Roberts, Zi Yan, linux-fsdevel, linux-mm, linux-kernel

Hi David,

On Thu, Mar 12, 2026 at 11:57 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 3/12/26 16:53, David Hildenbrand (Arm) wrote:
> > On 3/10/26 04:11, WANG Rui wrote:
> >> hugepage_global_enabled() and hugepage_global_always() only exist
> >> when CONFIG_TRANSPARENT_HUGEPAGE is set.  Add inline stubs that
> >> return false to let code compile when THP is disabled.
> >>
> >> Signed-off-by: WANG Rui <r@hev.cc>
> >> ---
> >>  include/linux/huge_mm.h | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index a4d9f964dfde..badeebd4ea98 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -570,6 +570,16 @@ void map_anon_folio_pmd_nopf(struct folio *folio, pmd_t *pmd,
> >>
> >>  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
> >>
> >> +static inline bool hugepage_global_enabled(void)
> >> +{
> >> +    return false;
> >> +}
> >> +
> >> +static inline bool hugepage_global_always(void)
> >> +{
> >> +    return false;
> >> +}
> >> +
> >>  static inline bool folio_test_pmd_mappable(struct folio *folio)
> >>  {
> >>      return false;
> >
> > There are other ways to enable PMD THP. So I don't quite think this is
> > the right tool for the job.
>
> Ah, you care about file THPs ... gah.
>
> Why can't we simply do the alignment without considering the current
> setting?

The main motivation of raising the alignment here is to increase the
chance of getting PMD-sized THPs for executable mappings.

If THP is not in "always" mode, the kernel will not automatically
collapse file-backed mappings into THPs, so the increased alignment
would not actually improve THP usage.

In that case we would only be introducing additional padding in the
virtual address layout, which slightly reduces ASLR entropy without
providing a practical benefit.

That's why the current code limits this to the "always" mode.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs
  2026-03-12 16:12       ` hev
@ 2026-03-12 16:29         ` David Hildenbrand (Arm)
  2026-03-13  0:10           ` hev
  2026-03-13  9:47           ` Lance Yang
  0 siblings, 2 replies; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12 16:29 UTC (permalink / raw)
  To: hev
  Cc: Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, Dev Jain, Jan Kara, Kees Cook, Lance Yang,
	Liam R. Howlett, Lorenzo Stoakes, Matthew Wilcox, Nico Pache,
	Ryan Roberts, Zi Yan, linux-fsdevel, linux-mm, linux-kernel

On 3/12/26 17:12, hev wrote:
> Hi David,
> 
> On Thu, Mar 12, 2026 at 11:57 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 3/12/26 16:53, David Hildenbrand (Arm) wrote:
>>>
>>> There are other ways to enable PMD THP. So I don't quite think this is
>>> the right tool for the job.
>>
>> Ah, you care about file THPs ... gah.
>>
>> Why can't we simply do the alignment without considering the current
>> setting?
> 
> The main motivation of raising the alignment here is to increase the
> chance of getting PMD-sized THPs for executable mappings.
> 
> If THP is not in "always" mode, the kernel will not automatically
> collapse file-backed mappings into THPs, so the increased alignment
> would not actually improve THP usage.
> 
> In that case we would only be introducing additional padding in the
> virtual address layout, which slightly reduces ASLR entropy without
> providing a practical benefit.

Well, that parameter can get toggled at runtime later? Also, I think
that readahead code could end up allocating a PMD THP (I might be
wrong about that, the code is confusing).

Let's take a look at __get_unmapped_area(), where we don't care about
ASLR entropy for anonymous memory:

} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && !file
	   && !addr /* no hint */
	   && IS_ALIGNED(len, PMD_SIZE)) {

Interestingly we had:

commit 34d7cf637c437d5c2a8a6ef23ea45193bad8a91c
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Fri Dec 6 15:03:45 2024 +0800

    mm: don't try THP alignment for FS without get_unmapped_area
    
    Commit ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") changes
    thp_get_unmapped_area() to thp_get_unmapped_area_vmflags() in
    __get_unmapped_area(), which doesn't initialize local get_area for
    anonymous mappings.  This leads to us always trying THP alignment even for
    file_operations which have a NULL ->get_unmapped_area() callback.
    
    Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
    boundaries") we only want to enable THP alignment for anonymous mappings,
    so add a !file check to avoid attempting THP alignment for file mappings.
    
    Found issue by code inspection.  THP alignment is used for easy or more
    pmd mappings, from vma side.  This may cause unnecessary VMA fragmentation
    and potentially worse performance on filesystems that do not actually
    support THPs and thus cannot benefit from the alignment.


I'm not sure about the "VMA fragmentation" argument, really. We only consider
stuff that is already multiples of PMD_SIZE.

Filesystem support for THPs is also not really something you would handle, and it's
a problem that solves itself over time as more filesystems keep adding support for
large folios.

So I think we should try limiting it to IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE),
but not checking the runtime toggle.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs
  2026-03-12 16:29         ` David Hildenbrand (Arm)
@ 2026-03-13  0:10           ` hev
  2026-03-13  9:47           ` Lance Yang
  1 sibling, 0 replies; 12+ messages in thread
From: hev @ 2026-03-13  0:10 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, Dev Jain, Jan Kara, Kees Cook, Lance Yang,
	Liam R. Howlett, Lorenzo Stoakes, Matthew Wilcox, Nico Pache,
	Ryan Roberts, Zi Yan, linux-fsdevel, linux-mm, linux-kernel

On Fri, Mar 13, 2026 at 12:29 AM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 3/12/26 17:12, hev wrote:
> > Hi David,
> >
> > On Thu, Mar 12, 2026 at 11:57 PM David Hildenbrand (Arm)
> > <david@kernel.org> wrote:
> >>
> >> On 3/12/26 16:53, David Hildenbrand (Arm) wrote:
> >>>
> >>> There are other ways to enable PMD THP. So I don't quite think this is
> >>> the right tool for the job.
> >>
> >> Ah, you care about file THPs ... gah.
> >>
> >> Why can't we simply do the alignment without considering the current
> >> setting?
> >
> > The main motivation of raising the alignment here is to increase the
> > chance of getting PMD-sized THPs for executable mappings.
> >
> > If THP is not in "always" mode, the kernel will not automatically
> > collapse file-backed mappings into THPs, so the increased alignment
> > would not actually improve THP usage.
> >
> > In that case we would only be introducing additional padding in the
> > virtual address layout, which slightly reduces ASLR entropy without
> > providing a practical benefit.
>
> Well, that parameter can get toggled at runtime later? Also, I think
> that readahead code could end up allocating a PMD THP (I might be
> wrong about that, the code is confusing).
>
> Let's take a look at __get_unmapped_area(), where we don't care about
> ASLR entropy for anonymous memory:
>
> } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && !file
>            && !addr /* no hint */
>            && IS_ALIGNED(len, PMD_SIZE)) {
>
> Interestingly we had:
>
> commit 34d7cf637c437d5c2a8a6ef23ea45193bad8a91c
> Author: Kefeng Wang <wangkefeng.wang@huawei.com>
> Date:   Fri Dec 6 15:03:45 2024 +0800
>
>     mm: don't try THP alignment for FS without get_unmapped_area
>
>     Commit ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") changes
>     thp_get_unmapped_area() to thp_get_unmapped_area_vmflags() in
>     __get_unmapped_area(), which doesn't initialize local get_area for
>     anonymous mappings.  This leads to us always trying THP alignment even for
>     file_operations which have a NULL ->get_unmapped_area() callback.
>
>     Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
>     boundaries") we only want to enable THP alignment for anonymous mappings,
>     so add a !file check to avoid attempting THP alignment for file mappings.
>
>     Found issue by code inspection.  THP alignment is used for easy or more
>     pmd mappings, from vma side.  This may cause unnecessary VMA fragmentation
>     and potentially worse performance on filesystems that do not actually
>     support THPs and thus cannot benefit from the alignment.
>
>
> I'm not sure about the "VMA fragmentation" argument, really. We only consider
> stuff that is already multiples of PMD_SIZE.
>
> Filesystem support for THPs is also not really something you would handle, and it's
> a problem that solves itself over time as more filesystems keep adding support for
> large folios.
>
> So I think we should try limiting it to IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE),
> but not checking the runtime toggle.

That's a fair point about the runtime toggle. Limiting it to
IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) sounds reasonable.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP
  2026-03-10  3:11 [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
  2026-03-10  3:11 ` [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs WANG Rui
  2026-03-10  3:11 ` [PATCH v4 2/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
@ 2026-03-13  8:41 ` Baolin Wang
  2026-03-13 10:46   ` Usama Arif
  2026-03-13 14:39   ` hev
  2 siblings, 2 replies; 12+ messages in thread
From: Baolin Wang @ 2026-03-13  8:41 UTC (permalink / raw)
  To: WANG Rui, Alexander Viro, Andrew Morton, Barry Song,
	Christian Brauner, David Hildenbrand, Dev Jain, Jan Kara,
	Kees Cook, Lance Yang, Liam R. Howlett, Lorenzo Stoakes,
	Matthew Wilcox, Nico Pache, Ryan Roberts, Zi Yan, usama.arif
  Cc: linux-fsdevel, linux-mm, linux-kernel

CC Usama

On 3/10/26 11:11 AM, WANG Rui wrote:
> Changes since [v3]:
> * Fixed compilation failure under !CONFIG_TRANSPARENT_HUGEPAGE.
> * No functional changes otherwise.
> 
> Changes since [v2]:
> * Renamed align_to_pmd() to should_align_to_pmd().
> * Added benchmark results to the commit message.
> 
> Changes since [v1]:
> * Dropped the Kconfig option CONFIG_ELF_RO_LOAD_THP_ALIGNMENT.
> * Moved the alignment logic into a helper align_to_pmd() for clarity.
> * Improved the comment explaining why we skip the optimization
>    when PMD_SIZE > 32MB.
> 
> When Transparent Huge Pages (THP) are enabled in "always" mode,
> file-backed read-only mappings can be backed by PMD-sized huge pages
> if they meet the alignment and size requirements.
> 
> For ELF executables loaded by the kernel ELF binary loader, PT_LOAD
> segments are normally aligned according to p_align, which is often
> only page-sized. As a result, large read-only segments that are
> otherwise eligible may fail to be mapped using PMD-sized THP.
> 
> A segment is considered eligible if:
> 
> * THP is in "always" mode,
> * it is not writable,
> * both p_vaddr and p_offset are PMD-aligned,
> * its file size is at least PMD_SIZE, and
> * its existing p_align is smaller than PMD_SIZE.
> 
> To avoid excessive address space padding on systems with very large
> PMD_SIZE values, this optimization is applied only when PMD_SIZE <= 32MB,
> since requiring larger alignments would be unreasonable, especially on
> 32-bit systems with a much more limited virtual address space.
> 
> This increases the likelihood that large text segments of ELF
> executables are backed by PMD-sized THP, reducing TLB pressure and
> improving performance for large binaries.
> 
> This only affects ELF executables loaded directly by the kernel
> binary loader. Shared libraries loaded by user space (e.g. via the
> dynamic linker) are not affected.

Usama posted a similar patchset[1], and I think using exec_folio_order() 
for exec-segment alignment is reasonable. In your case, you can override 
exec_folio_order() to return a PMD‑sized order.

[1] 
https://lore.kernel.org/all/20260310145406.3073394-1-usama.arif@linux.dev/

> Benchmark
> 
> Machine: AMD Ryzen 9 7950X (x86_64)
> Binutils: 2.46
> GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
> 
> Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
> 
>                  Without patch        With patch
> instructions    8,246,133,611,932    8,246,025,137,750
> cpu-cycles      8,001,028,142,928    7,565,925,107,502
> itlb-misses     3,672,158,331        26,821,242
> time elapsed    64.66 s              61.97 s
> 
> Instructions are basically unchanged. iTLB misses drop from ~3.67B to
> ~26M (~99.27% reduction), which results in about a ~5.44% reduction in
> cycles and ~4.18% shorter wall time for this workload.
> 
> [v3]: https://lore.kernel.org/linux-fsdevel/20260310013958.103636-1-r@hev.cc
> [v2]: https://lore.kernel.org/linux-fsdevel/20260304114727.384416-1-r@hev.cc
> [v1]: https://lore.kernel.org/linux-fsdevel/20260302155046.286650-1-r@hev.cc
> 
> WANG Rui (2):
>    huge_mm: add stubs for THP-disabled configs
>    binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for
>      THP
> 
>   fs/binfmt_elf.c         | 29 +++++++++++++++++++++++++++++
>   include/linux/huge_mm.h | 10 ++++++++++
>   2 files changed, 39 insertions(+)
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs
  2026-03-12 16:29         ` David Hildenbrand (Arm)
  2026-03-13  0:10           ` hev
@ 2026-03-13  9:47           ` Lance Yang
  1 sibling, 0 replies; 12+ messages in thread
From: Lance Yang @ 2026-03-13  9:47 UTC (permalink / raw)
  To: David Hildenbrand (Arm), hev
  Cc: Alexander Viro, Andrew Morton, Baolin Wang, Barry Song,
	Christian Brauner, Dev Jain, Jan Kara, Kees Cook, Liam R. Howlett,
	Lorenzo Stoakes, Matthew Wilcox, Nico Pache, Ryan Roberts, Zi Yan,
	linux-fsdevel, linux-mm, linux-kernel, wangkefeng.wang



On 2026/3/13 00:29, David Hildenbrand (Arm) wrote:
> On 3/12/26 17:12, hev wrote:
>> Hi David,
>>
>> On Thu, Mar 12, 2026 at 11:57 PM David Hildenbrand (Arm)
>> <david@kernel.org> wrote:
>>>
>>> On 3/12/26 16:53, David Hildenbrand (Arm) wrote:
>>>>
>>>> There are other ways to enable PMD THP. So I don't quite think this is
>>>> the right tool for the job.
>>>
>>> Ah, you care about file THPs ... gah.
>>>
>>> Why can't we simply do the alignment without considering the current
>>> setting?
>>
>> The main motivation of raising the alignment here is to increase the
>> chance of getting PMD-sized THPs for executable mappings.
>>
>> If THP is not in "always" mode, the kernel will not automatically
>> collapse file-backed mappings into THPs, so the increased alignment
>> would not actually improve THP usage.
>>
>> In that case we would only be introducing additional padding in the
>> virtual address layout, which slightly reduces ASLR entropy without
>> providing a practical benefit.
> 
> Well, that parameter can get toggled at runtime later? Also, I think
> that readahead code could end up allocating a PMD THP (I might be
> wrong about that, the code is confusing).

Right. In do_sync_mmap_readahead(), if the VMA has VM_HUGEPAGE,
force_thp_readahead becomes true and ra->order is set to
HPAGE_PMD_ORDER, IIUC.

	/* Use the readahead code, even if readahead is disabled */
	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
	    (vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER)
		force_thp_readahead = true;

That order is then passed down to page_cache_ra_order() and finally to
filemap_alloc_folio().

	if (force_thp_readahead) {
[...]
		ra->async_size = HPAGE_PMD_NR;
		ra->order = HPAGE_PMD_ORDER;
		page_cache_ra_order(&ractl, ra);
		return fpin;
	}


For plain VM_EXEC, the code starts from exec_folio_order(), not
HPAGE_PMD_ORDER.

	if (vm_flags & VM_EXEC) {
[...]
		ra->order = exec_folio_order();
[...]
		ra->async_size = 0;
	}

The default exec_folio_order() is small, and arm64 only overrides it
to 64K.

/*
  * Request exec memory is read into pagecache in at least 64K folios. 
This size
  * can be contpte-mapped when 4K base pages are in use (16 pages into 1 
iTLB
  * entry), and HPA can coalesce it (4 pages into 1 TLB entry) when 16K base
  * pages are in use.
  */

#define exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT)

> 
> Let's take a look at __get_unmapped_area(), where we don't care about
> ASLR entropy for anonymous memory:
> 
> } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && !file
> 	   && !addr /* no hint */
> 	   && IS_ALIGNED(len, PMD_SIZE)) {

Yeah. For anonymous memory, the kernel is willing to do THP-friendly
alignment, but it is constrained, of course :)

> Interestingly we had:
> 
> commit 34d7cf637c437d5c2a8a6ef23ea45193bad8a91c
> Author: Kefeng Wang <wangkefeng.wang@huawei.com>
> Date:   Fri Dec 6 15:03:45 2024 +0800
> 
>      mm: don't try THP alignment for FS without get_unmapped_area
>      
>      Commit ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") changes
>      thp_get_unmapped_area() to thp_get_unmapped_area_vmflags() in
>      __get_unmapped_area(), which doesn't initialize local get_area for
>      anonymous mappings.  This leads to us always trying THP alignment even for
>      file_operations which have a NULL ->get_unmapped_area() callback.
>      
>      Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
>      boundaries") we only want to enable THP alignment for anonymous mappings,
>      so add a !file check to avoid attempting THP alignment for file mappings.
>      
>      Found issue by code inspection.  THP alignment is used for easy or more
>      pmd mappings, from vma side.  This may cause unnecessary VMA fragmentation
>      and potentially worse performance on filesystems that do not actually
>      support THPs and thus cannot benefit from the alignment.

Looks like this commit does not *ban* file-backed THP-friendly alignment
completely. It only prevents file mappings from getting it accidentally
via the generic fallback path.

Note that some filesystems still explicitly opt in with their own

.get_unmapped_area = thp_get_unmapped_area

for example ext4, xfs, and btrfs.

So explicit filesystem opt-in is still allowed :)

> I'm not sure about the "VMA fragmentation" argument, really. We only consider
> stuff that is already multiples of PMD_SIZE.
>
> Filesystem support for THPs is also not really something you would handle, and it's
> a problem that solves itself over time as more filesystems keep adding support for
> large folios.
> 
> So I think we should try limiting it to IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE),
> but not checking the runtime toggle.

Good point! ELF layout is decided once at exec time, while the runtime 
THP mode
can change later.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP
  2026-03-13  8:41 ` [PATCH v4 0/2] " Baolin Wang
@ 2026-03-13 10:46   ` Usama Arif
  2026-03-13 14:39   ` hev
  1 sibling, 0 replies; 12+ messages in thread
From: Usama Arif @ 2026-03-13 10:46 UTC (permalink / raw)
  To: Baolin Wang, WANG Rui, Alexander Viro, Andrew Morton, Barry Song,
	Christian Brauner, David Hildenbrand, Dev Jain, Jan Kara,
	Kees Cook, Lance Yang, Liam R. Howlett, Lorenzo Stoakes,
	Matthew Wilcox, Nico Pache, Ryan Roberts, Zi Yan
  Cc: linux-fsdevel, linux-mm, linux-kernel



On 13/03/2026 11:41, Baolin Wang wrote:
> CC Usama
> 
> On 3/10/26 11:11 AM, WANG Rui wrote:
>> Changes since [v3]:
>> * Fixed compilation failure under !CONFIG_TRANSPARENT_HUGEPAGE.
>> * No functional changes otherwise.
>>
>> Changes since [v2]:
>> * Renamed align_to_pmd() to should_align_to_pmd().
>> * Added benchmark results to the commit message.
>>
>> Changes since [v1]:
>> * Dropped the Kconfig option CONFIG_ELF_RO_LOAD_THP_ALIGNMENT.
>> * Moved the alignment logic into a helper align_to_pmd() for clarity.
>> * Improved the comment explaining why we skip the optimization
>>    when PMD_SIZE > 32MB.
>>
>> When Transparent Huge Pages (THP) are enabled in "always" mode,
>> file-backed read-only mappings can be backed by PMD-sized huge pages
>> if they meet the alignment and size requirements.
>>
>> For ELF executables loaded by the kernel ELF binary loader, PT_LOAD
>> segments are normally aligned according to p_align, which is often
>> only page-sized. As a result, large read-only segments that are
>> otherwise eligible may fail to be mapped using PMD-sized THP.
>>
>> A segment is considered eligible if:
>>
>> * THP is in "always" mode,
>> * it is not writable,
>> * both p_vaddr and p_offset are PMD-aligned,
>> * its file size is at least PMD_SIZE, and
>> * its existing p_align is smaller than PMD_SIZE.
>>
>> To avoid excessive address space padding on systems with very large
>> PMD_SIZE values, this optimization is applied only when PMD_SIZE <= 32MB,
>> since requiring larger alignments would be unreasonable, especially on
>> 32-bit systems with a much more limited virtual address space.
>>
>> This increases the likelihood that large text segments of ELF
>> executables are backed by PMD-sized THP, reducing TLB pressure and
>> improving performance for large binaries.
>>
>> This only affects ELF executables loaded directly by the kernel
>> binary loader. Shared libraries loaded by user space (e.g. via the
>> dynamic linker) are not affected.
> 
> Usama posted a similar patchset[1], and I think using exec_folio_order() for exec-segment alignment is reasonable. In your case, you can override exec_folio_order() to return a PMD‑sized order.
> 
> [1] https://lore.kernel.org/all/20260310145406.3073394-1-usama.arif@linux.dev/
> 

Thanks for the CC Baolin! Happy to see someone else noticed the same issue!

Yeah I agree, I think piggybacking off exec_folio_order() as done in 1 should be
the right appproach.

I also think there is maybe a bug in do_sync_mmap_readahead that needs to be fixed
when it comes to mmap_miss counter [2].


[1] https://lore.kernel.org/all/20260310145406.3073394-1-usama.arif@linux.dev/
[2] https://lore.kernel.org/all/20260310145406.3073394-3-usama.arif@linux.dev/

>> Benchmark
>>
>> Machine: AMD Ryzen 9 7950X (x86_64)
>> Binutils: 2.46
>> GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
>>
>> Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
>>
>>                  Without patch        With patch
>> instructions    8,246,133,611,932    8,246,025,137,750
>> cpu-cycles      8,001,028,142,928    7,565,925,107,502
>> itlb-misses     3,672,158,331        26,821,242
>> time elapsed    64.66 s              61.97 s
>>
>> Instructions are basically unchanged. iTLB misses drop from ~3.67B to
>> ~26M (~99.27% reduction), which results in about a ~5.44% reduction in
>> cycles and ~4.18% shorter wall time for this workload.
>>
>> [v3]: https://lore.kernel.org/linux-fsdevel/20260310013958.103636-1-r@hev.cc
>> [v2]: https://lore.kernel.org/linux-fsdevel/20260304114727.384416-1-r@hev.cc
>> [v1]: https://lore.kernel.org/linux-fsdevel/20260302155046.286650-1-r@hev.cc
>>
>> WANG Rui (2):
>>    huge_mm: add stubs for THP-disabled configs
>>    binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for
>>      THP
>>
>>   fs/binfmt_elf.c         | 29 +++++++++++++++++++++++++++++
>>   include/linux/huge_mm.h | 10 ++++++++++
>>   2 files changed, 39 insertions(+)
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP
  2026-03-13  8:41 ` [PATCH v4 0/2] " Baolin Wang
  2026-03-13 10:46   ` Usama Arif
@ 2026-03-13 14:39   ` hev
  1 sibling, 0 replies; 12+ messages in thread
From: hev @ 2026-03-13 14:39 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Alexander Viro, Andrew Morton, Barry Song, Christian Brauner,
	David Hildenbrand, Dev Jain, Jan Kara, Kees Cook, Lance Yang,
	Liam R. Howlett, Lorenzo Stoakes, Matthew Wilcox, Nico Pache,
	Ryan Roberts, Zi Yan, usama.arif, linux-fsdevel, linux-mm,
	linux-kernel

On Fri, Mar 13, 2026 at 4:41 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> CC Usama
>
> Usama posted a similar patchset[1], and I think using exec_folio_order()
> for exec-segment alignment is reasonable. In your case, you can override
> exec_folio_order() to return a PMD‑sized order.
>
> [1]
> https://lore.kernel.org/all/20260310145406.3073394-1-usama.arif@linux.dev/

Thanks for the pointer!

Cheers,
Rui

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-13 14:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10  3:11 [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
2026-03-10  3:11 ` [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs WANG Rui
2026-03-12 15:53   ` David Hildenbrand (Arm)
2026-03-12 15:57     ` David Hildenbrand (Arm)
2026-03-12 16:12       ` hev
2026-03-12 16:29         ` David Hildenbrand (Arm)
2026-03-13  0:10           ` hev
2026-03-13  9:47           ` Lance Yang
2026-03-10  3:11 ` [PATCH v4 2/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
2026-03-13  8:41 ` [PATCH v4 0/2] " Baolin Wang
2026-03-13 10:46   ` Usama Arif
2026-03-13 14:39   ` hev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox