All of lore.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	david@kernel.org, willy@infradead.org, ryan.roberts@arm.com,
	linux-mm@kvack.org
Cc: r@hev.cc, jack@suse.cz, ajd@linux.ibm.com, apopple@nvidia.com,
	baohua@kernel.org, baolin.wang@linux.alibaba.com,
	brauner@kernel.org, catalin.marinas@arm.com, dev.jain@arm.com,
	kees@kernel.org, kevin.brodsky@arm.com, lance.yang@linux.dev,
	Liam.Howlett@oracle.com, linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Lorenzo Stoakes <ljs@kernel.org>,
	mhocko@suse.com, npache@redhat.com, pasha.tatashin@soleen.com,
	rmclure@linux.ibm.com, rppt@kernel.org, surenb@google.com,
	vbabka@kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	wilts.infradead.org, linux-fsdevel@vger.kernel.l@kernel.org,
	ziy@nvidia.com, hannes@cmpxchg.org, kas@kernel.org,
	shakeel.butt@linux.dev, leitao@debian.org, kernel-team@meta.com,
	Usama Arif <usama.arif@linux.dev>
Subject: [PATCH v3 3/4] elf: align ET_DYN base for PTE coalescing and PMD mapping
Date: Thu,  2 Apr 2026 11:08:24 -0700	[thread overview]
Message-ID: <20260402181326.3107102-4-usama.arif@linux.dev> (raw)
In-Reply-To: <20260402181326.3107102-1-usama.arif@linux.dev>

For PIE binaries (ET_DYN), the load address is randomized at PAGE_SIZE
granularity via arch_mmap_rnd(). On arm64 with 64K base pages, this
means the binary is 64K-aligned, but contpte mapping requires 2M
(CONT_PTE_SIZE) alignment.

Without proper virtual address alignment, readahead patches that
allocate large folios with aligned file offsets and physical addresses
cannot benefit from contpte mapping, as the contpte fold check in
contpte_set_ptes() requires the virtual address to be CONT_PTE_SIZE-
aligned.

Fix this by extending maximum_alignment() to consider folio alignment
at two tiers, matching the readahead allocation strategy:

- HPAGE_PMD_SIZE, so large folios can be PMD-mapped on
  architectures where PMD_SIZE is reasonable (e.g. 2M on x86-64
  and arm64 with 4K pages).

- exec_folio_order(), the minimum order for hardware TLB
  coalescing (e.g. arm64 contpte/HPA).

For each PT_LOAD segment, folio_alignment() tries both tiers and
returns the largest power-of-2 alignment that fits within the segment
size, with both p_vaddr and p_offset aligned to that size. This
ensures load_bias is folio-aligned so that file-offset-aligned folios
map to properly aligned virtual addresses, enabling hardware PTE
coalescing and PMD mappings for large folios.

The segment size check in folio_alignment() avoids reducing ASLR
entropy for small binaries that cannot benefit from large folio
alignment.

Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 fs/binfmt_elf.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 16a56b6b3f6c..f84fae6daf23 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -488,6 +488,54 @@ static int elf_read(struct file *file, void *buf, size_t len, loff_t pos)
 	return 0;
 }
 
+/*
+ * Return the largest folio alignment for a PT_LOAD segment, so the
+ * hardware can coalesce PTEs (e.g. arm64 contpte) or use PMD mappings
+ * for large folios.
+ *
+ * Try PMD alignment so large folios can be PMD-mapped. Then try
+ * exec_folio_order() alignment for hardware TLB coalescing (e.g.
+ * arm64 contpte/HPA).
+ *
+ * Use the largest power-of-2 that fits within the segment size, capped
+ * by the target folio size.
+ * Only align when the segment's virtual address and file offset are
+ * already aligned to that size, as misalignment would prevent coalescing
+ * anyway.
+ *
+ * The segment size check avoids reducing ASLR entropy for small binaries
+ * that cannot benefit.
+ */
+static unsigned long folio_alignment(struct elf_phdr *cmd)
+{
+	unsigned long alignment = 0;
+	unsigned long seg_size;
+
+	if (!cmd->p_filesz)
+		return 0;
+
+	seg_size = rounddown_pow_of_two(cmd->p_filesz);
+
+	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+		unsigned long size = min(seg_size, HPAGE_PMD_SIZE);
+
+		if (size > PAGE_SIZE &&
+		    IS_ALIGNED(cmd->p_vaddr | cmd->p_offset, size))
+			alignment = size;
+	}
+
+	if (!alignment && exec_folio_order()) {
+		unsigned long size = min(seg_size,
+					PAGE_SIZE << exec_folio_order());
+
+		if (size > PAGE_SIZE &&
+		    IS_ALIGNED(cmd->p_vaddr | cmd->p_offset, size))
+			alignment = size;
+	}
+
+	return alignment;
+}
+
 static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
 {
 	unsigned long alignment = 0;
@@ -501,6 +549,8 @@ static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
 			if (!is_power_of_2(p_align))
 				continue;
 			alignment = max(alignment, p_align);
+			alignment = max(alignment,
+					folio_alignment(&cmds[i]));
 		}
 	}
 
-- 
2.52.0


WARNING: multiple messages have this Message-ID (diff)
From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	david@kernel.org, willy@infradead.org, ryan.roberts@arm.com,
	linux-mm@kvack.org
Cc: r@hev.cc, jack@suse.cz, ajd@linux.ibm.com, apopple@nvidia.com,
	baohua@kernel.org, baolin.wang@linux.alibaba.com,
	brauner@kernel.org, catalin.marinas@arm.com, dev.jain@arm.com,
	kees@kernel.org, kevin.brodsky@arm.com, lance.yang@linux.dev,
	Liam.Howlett@oracle.com, linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Lorenzo Stoakes <ljs@kernel.org>,
	mhocko@suse.com, npache@redhat.com, pasha.tatashin@soleen.com,
	rmclure@linux.ibm.com, rppt@kernel.org, surenb@google.com,
	vbabka@kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	wilts.infradead.org@kvack.org,
	"linux-fsdevel@vger.kernel.l"@kernel.org, ziy@nvidia.com,
	hannes@cmpxchg.org, kas@kernel.org, shakeel.butt@linux.dev,
	leitao@debian.org, kernel-team@meta.com,
	Usama Arif <usama.arif@linux.dev>
Subject: [PATCH v3 3/4] elf: align ET_DYN base for PTE coalescing and PMD mapping
Date: Thu,  2 Apr 2026 11:08:24 -0700	[thread overview]
Message-ID: <20260402181326.3107102-4-usama.arif@linux.dev> (raw)
In-Reply-To: <20260402181326.3107102-1-usama.arif@linux.dev>

For PIE binaries (ET_DYN), the load address is randomized at PAGE_SIZE
granularity via arch_mmap_rnd(). On arm64 with 64K base pages, this
means the binary is 64K-aligned, but contpte mapping requires 2M
(CONT_PTE_SIZE) alignment.

Without proper virtual address alignment, readahead patches that
allocate large folios with aligned file offsets and physical addresses
cannot benefit from contpte mapping, as the contpte fold check in
contpte_set_ptes() requires the virtual address to be CONT_PTE_SIZE-
aligned.

Fix this by extending maximum_alignment() to consider folio alignment
at two tiers, matching the readahead allocation strategy:

- HPAGE_PMD_SIZE, so large folios can be PMD-mapped on
  architectures where PMD_SIZE is reasonable (e.g. 2M on x86-64
  and arm64 with 4K pages).

- exec_folio_order(), the minimum order for hardware TLB
  coalescing (e.g. arm64 contpte/HPA).

For each PT_LOAD segment, folio_alignment() tries both tiers and
returns the largest power-of-2 alignment that fits within the segment
size, with both p_vaddr and p_offset aligned to that size. This
ensures load_bias is folio-aligned so that file-offset-aligned folios
map to properly aligned virtual addresses, enabling hardware PTE
coalescing and PMD mappings for large folios.

The segment size check in folio_alignment() avoids reducing ASLR
entropy for small binaries that cannot benefit from large folio
alignment.

Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 fs/binfmt_elf.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 16a56b6b3f6c..f84fae6daf23 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -488,6 +488,54 @@ static int elf_read(struct file *file, void *buf, size_t len, loff_t pos)
 	return 0;
 }
 
+/*
+ * Return the largest folio alignment for a PT_LOAD segment, so the
+ * hardware can coalesce PTEs (e.g. arm64 contpte) or use PMD mappings
+ * for large folios.
+ *
+ * Try PMD alignment so large folios can be PMD-mapped. Then try
+ * exec_folio_order() alignment for hardware TLB coalescing (e.g.
+ * arm64 contpte/HPA).
+ *
+ * Use the largest power-of-2 that fits within the segment size, capped
+ * by the target folio size.
+ * Only align when the segment's virtual address and file offset are
+ * already aligned to that size, as misalignment would prevent coalescing
+ * anyway.
+ *
+ * The segment size check avoids reducing ASLR entropy for small binaries
+ * that cannot benefit.
+ */
+static unsigned long folio_alignment(struct elf_phdr *cmd)
+{
+	unsigned long alignment = 0;
+	unsigned long seg_size;
+
+	if (!cmd->p_filesz)
+		return 0;
+
+	seg_size = rounddown_pow_of_two(cmd->p_filesz);
+
+	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+		unsigned long size = min(seg_size, HPAGE_PMD_SIZE);
+
+		if (size > PAGE_SIZE &&
+		    IS_ALIGNED(cmd->p_vaddr | cmd->p_offset, size))
+			alignment = size;
+	}
+
+	if (!alignment && exec_folio_order()) {
+		unsigned long size = min(seg_size,
+					PAGE_SIZE << exec_folio_order());
+
+		if (size > PAGE_SIZE &&
+		    IS_ALIGNED(cmd->p_vaddr | cmd->p_offset, size))
+			alignment = size;
+	}
+
+	return alignment;
+}
+
 static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
 {
 	unsigned long alignment = 0;
@@ -501,6 +549,8 @@ static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
 			if (!is_power_of_2(p_align))
 				continue;
 			alignment = max(alignment, p_align);
+			alignment = max(alignment,
+					folio_alignment(&cmds[i]));
 		}
 	}
 
-- 
2.52.0



  parent reply	other threads:[~2026-04-02 18:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 18:08 [PATCH v3 0/4] mm: improve large folio readahead and alignment for exec memory Usama Arif
2026-04-02 18:08 ` Usama Arif
2026-04-02 18:08 ` [PATCH v3 1/4] mm: bypass mmap_miss heuristic for VM_EXEC readahead Usama Arif
2026-04-02 18:08   ` Usama Arif
2026-04-02 18:08 ` [PATCH v3 2/4] mm: use tiered folio allocation " Usama Arif
2026-04-02 18:08   ` Usama Arif
2026-04-13 11:03   ` Jan Kara
2026-04-13 11:03     ` Jan Kara
2026-04-13 11:48     ` Usama Arif
2026-04-13 11:48       ` Usama Arif
2026-04-02 18:08 ` Usama Arif [this message]
2026-04-02 18:08   ` [PATCH v3 3/4] elf: align ET_DYN base for PTE coalescing and PMD mapping Usama Arif
2026-04-02 18:08 ` [PATCH v3 4/4] mm: align file-backed mmap to exec folio order in thp_get_unmapped_area Usama Arif
2026-04-02 18:08   ` Usama Arif
2026-04-10 11:03 ` [PATCH v3 0/4] mm: improve large folio readahead and alignment for exec memory Usama Arif
2026-04-10 11:55   ` Lorenzo Stoakes
2026-04-10 11:57     ` Lorenzo Stoakes
2026-04-10 12:19       ` Usama Arif
2026-04-10 12:24         ` Lorenzo Stoakes
2026-04-10 13:29           ` Vlastimil Babka (SUSE)
2026-04-10 13:50             ` Lorenzo Stoakes
2026-04-10 14:02           ` David Hildenbrand (Arm)
2026-04-10 12:05     ` Usama Arif
2026-04-10 12:13       ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260402181326.3107102-4-usama.arif@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=ajd@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=jack@suse.cz \
    --cc=kees@kernel.org \
    --cc=kevin.brodsky@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.l \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=r@hev.cc \
    --cc=rmclure@linux.ibm.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.