From: WANG Rui <r@hev.cc>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
David Hildenbrand <david@kernel.org>, Jan Kara <jack@suse.cz>,
Kees Cook <kees@kernel.org>, Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, WANG Rui <r@hev.cc>
Subject: [PATCH v5] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP
Date: Fri, 13 Mar 2026 08:52:11 +0800 [thread overview]
Message-ID: <20260313005211.882831-1-r@hev.cc> (raw)
File-backed mappings can only be collapsed into PMD-sized THP when the
virtual address and file offset are both hugepage-aligned and the
mapping is large enough to cover a huge page.
For ELF executables loaded by the kernel ELF binary loader, PT_LOAD
segments are aligned according to p_align, which is often just the
normal page size. As a result, large read-only segments that would
otherwise be eligible may fail to get PMD-sized mappings.
Even when a PT_LOAD segment itself is not PMD-aligned, it may still
contain a PMD-aligned subrange. In that case only that subrange can
be mapped with huge pages, while the unaligned head of the segment
remains mapped with normal pages.
In practice, many executables already have PMD-aligned file offsets
for their text segments, but the virtual address is not aligned due
to the small p_align value. Aligning the segment to PMD_SIZE in such
cases increases the chance of getting PMD-sized THP mappings.
This matters especially for 2MB huge pages, where many programs have
text segments only slightly larger than a single huge page. If the
start address is not aligned, the leading unaligned region can prevent
the mapping from forming a huge page. For larger huge pages (e.g. 32MB),
the unaligned head region may be close to the huge page size itself,
making the potential performance impact even more significant.
A segment is considered eligible if:
* it is not writable,
* both p_vaddr and p_offset are PMD-aligned,
* its size is at least PMD_SIZE, and
* its existing p_align is smaller than PMD_SIZE.
To avoid excessive virtual address space padding on systems with very
large PMD_SIZE values, this is only applied when PMD_SIZE <= 32MB.
This mainly benefits large text segments of executables by reducing
iTLB pressure.
This only affects ELF executables loaded directly by the kernel ELF
binary loader. Shared libraries loaded from user space (e.g. by the
dynamic linker) are not affected.
Benchmark
Machine: AMD Ryzen 9 7950X (x86_64)
Binutils: 2.46
GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
Without patch With patch
instructions 8,246,133,611,932 8,246,025,137,750
cpu-cycles 8,001,028,142,928 7,565,925,107,502
itlb-misses 3,672,158,331 26,821,242
time elapsed 64.66 s 61.97 s
Instructions are basically unchanged. iTLB misses drop from ~3.67B to
~26M (~99.27% reduction), which results in about a ~5.44% reduction in
cycles and ~4.18% shorter wall time for this workload.
Signed-off-by: WANG Rui <r@hev.cc>
---
Changes since [v4]:
* Drop runtime THP mode check, only gate on CONFIG_TRANSPARENT_HUGEPAGE.
Changes since [v3]:
* Fix compilation failure under !CONFIG_TRANSPARENT_HUGEPAGE.
* No functional changes otherwise.
Changes since [v2]:
* Rename align_to_pmd() to should_align_to_pmd().
* Add benchmark results to the commit message.
Changes since [v1]:
* Drop the Kconfig option CONFIG_ELF_RO_LOAD_THP_ALIGNMENT.
* Move the alignment logic into a helper align_to_pmd() for clarity.
* Improve the comment explaining why we skip the optimization
when PMD_SIZE > 32MB.
[v4]: https://lore.kernel.org/linux-fsdevel/20260310031138.509730-1-r@hev.cc
[v3]: https://lore.kernel.org/linux-fsdevel/20260310013958.103636-1-r@hev.cc
[v2]: https://lore.kernel.org/linux-fsdevel/20260304114727.384416-1-r@hev.cc
[v1]: https://lore.kernel.org/linux-fsdevel/20260302155046.286650-1-r@hev.cc
---
fs/binfmt_elf.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index fb857faaf0d6..d5f5154079de 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -489,6 +489,32 @@ static int elf_read(struct file *file, void *buf, size_t len, loff_t pos)
return 0;
}
+static inline bool should_align_to_pmd(const struct elf_phdr *cmd)
+{
+ if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+ return false;
+
+ /*
+ * Avoid excessive virtual address space padding when PMD_SIZE is very
+ * large, since this function increases PT_LOAD alignment.
+ * This threshold roughly matches the largest commonly used hugepage
+ * sizes on current architectures (e.g. x86 2M, arm64 32M with 16K pages).
+ */
+ if (PMD_SIZE > SZ_32M)
+ return false;
+
+ if (!IS_ALIGNED(cmd->p_vaddr | cmd->p_offset, PMD_SIZE))
+ return false;
+
+ if (cmd->p_filesz < PMD_SIZE)
+ return false;
+
+ if (cmd->p_flags & PF_W)
+ return false;
+
+ return true;
+}
+
static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
{
unsigned long alignment = 0;
@@ -501,6 +527,10 @@ static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
/* skip non-power of two alignments as invalid */
if (!is_power_of_2(p_align))
continue;
+
+ if (p_align < PMD_SIZE && should_align_to_pmd(&cmds[i]))
+ p_align = PMD_SIZE;
+
alignment = max(alignment, p_align);
}
}
--
2.53.0
next reply other threads:[~2026-03-13 0:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-13 0:52 WANG Rui [this message]
2026-03-15 3:46 ` [PATCH v5] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP Lance Yang
2026-03-15 4:10 ` WANG Rui
2026-03-15 13:12 ` Usama Arif
2026-03-20 11:44 ` David Hildenbrand (Arm)
2026-03-20 17:11 ` WANG Rui
2026-03-21 14:21 ` WANG Rui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260313005211.882831-1-r@hev.cc \
--to=r@hev.cc \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=jack@suse.cz \
--cc=kees@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox