public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	david@kernel.org, willy@infradead.org, ryan.roberts@arm.com,
	linux-mm@kvack.org
Cc: r@hev.cc, jack@suse.cz, ajd@linux.ibm.com, apopple@nvidia.com,
	baohua@kernel.org, baolin.wang@linux.alibaba.com,
	brauner@kernel.org, catalin.marinas@arm.com, dev.jain@arm.com,
	kees@kernel.org, kevin.brodsky@arm.com, lance.yang@linux.dev,
	Liam.Howlett@oracle.com, linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	lorenzo.stoakes@oracle.com, mhocko@suse.com, npache@redhat.com,
	pasha.tatashin@soleen.com, rmclure@linux.ibm.com,
	rppt@kernel.org, surenb@google.com, vbabka@kernel.org,
	Al Viro <viro@zeniv.linux.org.uk>,
	wilts.infradead.org, linux-fsdevel@vger.kernel.l@kernel.org,
	ziy@nvidia.com, hannes@cmpxchg.org, kas@kernel.org,
	shakeel.butt@linux.dev, kernel-team@meta.com,
	Usama Arif <usama.arif@linux.dev>
Subject: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
Date: Fri, 20 Mar 2026 06:58:53 -0700	[thread overview]
Message-ID: <20260320140315.979307-4-usama.arif@linux.dev> (raw)
In-Reply-To: <20260320140315.979307-1-usama.arif@linux.dev>

For PIE binaries (ET_DYN), the load address is randomized at PAGE_SIZE
granularity via arch_mmap_rnd(). On arm64 with 64K base pages, this
means the binary is 64K-aligned, but contpte mapping requires 2M
(CONT_PTE_SIZE) alignment.

Without proper virtual address alignment, readahead patches that
allocate 2M folios with 2M-aligned file offsets and physical addresses
cannot benefit from contpte mapping, as the contpte fold check in
contpte_set_ptes() requires the virtual address to be CONT_PTE_SIZE-
aligned.

Fix this by extending maximum_alignment() to consider the maximum folio
size supported by the page cache (via mapping_max_folio_size()). For
each PT_LOAD segment, the alignment is bumped to the largest
power-of-2 that fits within the segment size, capped by the max folio
size the filesystem will allocate, if:

  - Both p_vaddr and p_offset are aligned to that size
  - The segment is large enough (p_filesz >= size)

This ensures load_bias is folio-aligned so that file-offset-aligned
folios map to properly aligned virtual addresses, enabling hardware PTE
coalescing (e.g. arm64 contpte) and PMD mappings for large folios.

The segment size check avoids reducing ASLR entropy for small binaries
that cannot benefit from large folio alignment.

Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 fs/binfmt_elf.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 8e89cc5b28200..042af81766fcd 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -49,6 +49,7 @@
 #include <uapi/linux/rseq.h>
 #include <asm/param.h>
 #include <asm/page.h>
+#include <linux/pagemap.h>
 
 #ifndef ELF_COMPAT
 #define ELF_COMPAT 0
@@ -488,19 +489,51 @@ static int elf_read(struct file *file, void *buf, size_t len, loff_t pos)
 	return 0;
 }
 
-static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
+static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr,
+				       struct file *filp)
 {
 	unsigned long alignment = 0;
+	unsigned long max_folio_size = PAGE_SIZE;
 	int i;
 
+	if (filp && filp->f_mapping)
+		max_folio_size = mapping_max_folio_size(filp->f_mapping);
+
 	for (i = 0; i < nr; i++) {
 		if (cmds[i].p_type == PT_LOAD) {
 			unsigned long p_align = cmds[i].p_align;
+			unsigned long size;
 
 			/* skip non-power of two alignments as invalid */
 			if (!is_power_of_2(p_align))
 				continue;
 			alignment = max(alignment, p_align);
+
+			/*
+			 * Try to align the binary to the largest folio
+			 * size that the page cache supports, so the
+			 * hardware can coalesce PTEs (e.g. arm64
+			 * contpte) or use PMD mappings for large folios.
+			 *
+			 * Use the largest power-of-2 that fits within
+			 * the segment size, capped by what the page
+			 * cache will allocate. Only align when the
+			 * segment's virtual address and file offset are
+			 * already aligned to the folio size, as
+			 * misalignment would prevent coalescing anyway.
+			 *
+			 * The segment size check avoids reducing ASLR
+			 * entropy for small binaries that cannot
+			 * benefit.
+			 */
+			if (!cmds[i].p_filesz)
+				continue;
+			size = rounddown_pow_of_two(cmds[i].p_filesz);
+			size = min(size, max_folio_size);
+			if (size > PAGE_SIZE &&
+			    IS_ALIGNED(cmds[i].p_vaddr, size) &&
+			    IS_ALIGNED(cmds[i].p_offset, size))
+				alignment = max(alignment, size);
 		}
 	}
 
@@ -1104,7 +1137,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			}
 
 			/* Calculate any requested alignment. */
-			alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum);
+			alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum,
+						      bprm->file);
 
 			/**
 			 * DOC: PIE handling
-- 
2.52.0


  parent reply	other threads:[~2026-03-20 14:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 13:58 [PATCH v2 0/4] mm: improve large folio readahead and alignment for exec memory Usama Arif
2026-03-20 13:58 ` [PATCH v2 1/4] mm: bypass mmap_miss heuristic for VM_EXEC readahead Usama Arif
2026-03-20 14:18   ` Jan Kara
2026-03-20 14:26   ` Kiryl Shutsemau
2026-03-20 13:58 ` [PATCH v2 2/4] mm: replace exec_folio_order() with generic preferred_exec_order() Usama Arif
2026-03-20 14:41   ` Kiryl Shutsemau
2026-03-20 14:42   ` Jan Kara
2026-03-26 12:40     ` Usama Arif
2026-03-20 13:58 ` Usama Arif [this message]
2026-03-20 14:55   ` [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing Kiryl Shutsemau
2026-03-20 15:58   ` Matthew Wilcox
2026-03-20 16:05   ` WANG Rui
2026-03-20 17:47     ` Matthew Wilcox
2026-03-20 13:58 ` [PATCH v2 4/4] mm: align file-backed mmap to max folio order in thp_get_unmapped_area Usama Arif
2026-03-20 15:06   ` Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260320140315.979307-4-usama.arif@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=ajd@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=jack@suse.cz \
    --cc=kees@kernel.org \
    --cc=kevin.brodsky@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.l \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=r@hev.cc \
    --cc=rmclure@linux.ibm.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox