From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
ryan.roberts@arm.com, david@kernel.org
Cc: ajd@linux.ibm.com, anshuman.khandual@arm.com, apopple@nvidia.com,
baohua@kernel.org, baolin.wang@linux.alibaba.com,
brauner@kernel.org, catalin.marinas@arm.com, dev.jain@arm.com,
jack@suse.cz, kees@kernel.org, kevin.brodsky@arm.com,
lance.yang@linux.dev, Liam.Howlett@oracle.com,
linux-arm-kernel@lists.infradead.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, lorenzo.stoakes@oracle.com,
npache@redhat.com, rmclure@linux.ibm.com,
Al Viro <viro@zeniv.linux.org.uk>,
will@kernel.org, willy@infradead.org, ziy@nvidia.com,
hannes@cmpxchg.org, kas@kernel.org, shakeel.butt@linux.dev,
kernel-team@meta.com, Usama Arif <usama.arif@linux.dev>
Subject: [PATCH 0/4] arm64/mm: contpte-sized exec folios for 16K and 64K pages
Date: Tue, 10 Mar 2026 07:51:13 -0700 [thread overview]
Message-ID: <20260310145406.3073394-1-usama.arif@linux.dev> (raw)
On arm64, the contpte hardware feature coalesces multiple contiguous PTEs
into a single iTLB entry, reducing iTLB pressure for large executable
mappings.
exec_folio_order() was introduced [1] to request readahead at an
arch-preferred folio order for executable memory, enabling contpte
mapping on the fault path.
However, several things prevent this from working optimally on 16K and
64K page configurations:
1. exec_folio_order() returns ilog2(SZ_64K >> PAGE_SHIFT), which only
produces the optimal contpte order for 4K pages. For 16K pages it
returns order 2 (64K) instead of order 7 (2M), and for 64K pages it
returns order 0 (64K) instead of order 5 (2M). Patch 1 fixes this by
using ilog2(CONT_PTES) which evaluates to the optimal order for all
page sizes.
2. Even with the optimal order, the mmap_miss heuristic in
do_sync_mmap_readahead() silently disables exec readahead after 100
page faults. The mmap_miss counter tracks whether readahead is useful
for mmap'd file access:
- Incremented by 1 in do_sync_mmap_readahead() on every page cache
miss (page needed IO).
- Decremented by N in filemap_map_pages() for N pages successfully
mapped via fault-around (pages found in cache without faulting,
evidence that readahead was useful). Only non-workingset pages
count and recently evicted and re-read pages don't count as hits.
- Decremented by 1 in do_async_mmap_readahead() when a PG_readahead
marker page is found (indicates sequential consumption of readahead
pages).
When mmap_miss exceeds MMAP_LOTSAMISS (100), all readahead is
disabled. On 64K pages, both decrement paths are inactive:
- filemap_map_pages() is never called because fault_around_pages
(65536 >> PAGE_SHIFT = 1) disables should_fault_around(), which
requires fault_around_pages > 1. With only 1 page in the
fault-around window, there is nothing "around" to map.
- do_async_mmap_readahead() never fires for exec mappings because
exec readahead sets async_size = 0, so no PG_readahead markers
are placed.
With no decrements, mmap_miss monotonically increases past
MMAP_LOTSAMISS after 100 faults, disabling exec readahead
for the remainder of the mapping.
Patch 2 fixes this by moving the VM_EXEC readahead block
above the mmap_miss check, since exec readahead is targeted (one
folio at the fault location, async_size=0) not speculative prefetch.
3. Even with correct folio order and readahead, contpte mapping requires
the virtual address to be aligned to CONT_PTE_SIZE (2M on 64K pages).
The readahead path aligns file offsets and the buddy allocator aligns
physical memory, but the virtual address depends on the VMA start.
For PIE binaries, ASLR randomizes the load address at PAGE_SIZE (64K)
granularity, giving only a 1/32 chance of 2M alignment. When
misaligned, contpte_set_ptes() never sets the contiguous PTE bit for
any folio in the VMA, resulting in zero iTLB coalescing benefit.
Patch 3 fixes this for the main binary by bumping the ELF loader's
alignment to PAGE_SIZE << exec_folio_order() for ET_DYN binaries.
Patch 4 fixes this for shared libraries by adding a contpte-size
alignment fallback in thp_get_unmapped_area_vmflags(). The existing
PMD_SIZE alignment (512M on 64K pages) is too large for typical shared
libraries, so this smaller fallback (2M) succeeds where PMD fails.
I created a benchmark that mmaps a large executable file and calls
RET-stub functions at PAGE_SIZE offsets across it. "Cold" measures
fault + readahead cost. "Random" first faults in all pages with a
sequential sweep (not measured), then measures time for calling random
offsets, isolating iTLB miss cost for scattered execution.
The benchmark results on Neoverse V2 (Grace), arm64 with 64K base pages,
512MB executable file on ext4, averaged over 3 runs:
Phase | Baseline | Patched | Improvement
-----------|--------------|--------------|------------------
Cold fault | 83.4 ms | 41.3 ms | 50% faster
Random | 76.0 ms | 58.3 ms | 23% faster
[1] https://lore.kernel.org/all/20250430145920.3748738-6-ryan.roberts@arm.com/
Usama Arif (4):
arm64: request contpte-sized folios for exec memory
mm: bypass mmap_miss heuristic for VM_EXEC readahead
elf: align ET_DYN base to exec folio order for contpte mapping
mm: align file-backed mmap to exec folio order in
thp_get_unmapped_area
arch/arm64/include/asm/pgtable.h | 9 ++--
fs/binfmt_elf.c | 15 +++++++
mm/filemap.c | 72 +++++++++++++++++---------------
mm/huge_memory.c | 17 ++++++++
4 files changed, 75 insertions(+), 38 deletions(-)
--
2.47.3
next reply other threads:[~2026-03-10 14:54 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 14:51 Usama Arif [this message]
2026-03-10 14:51 ` [PATCH 1/4] arm64: request contpte-sized folios for exec memory Usama Arif
2026-03-19 7:35 ` David Hildenbrand (Arm)
2026-03-10 14:51 ` [PATCH 2/4] mm: bypass mmap_miss heuristic for VM_EXEC readahead Usama Arif
2026-03-18 16:43 ` Jan Kara
2026-03-19 7:37 ` David Hildenbrand (Arm)
2026-03-10 14:51 ` [PATCH 3/4] elf: align ET_DYN base to exec folio order for contpte mapping Usama Arif
2026-03-13 14:42 ` WANG Rui
2026-03-13 19:47 ` Usama Arif
2026-03-14 2:10 ` hev
2026-03-10 14:51 ` [PATCH 4/4] mm: align file-backed mmap to exec folio order in thp_get_unmapped_area Usama Arif
2026-03-14 3:47 ` WANG Rui
2026-03-13 13:20 ` [PATCH 0/4] arm64/mm: contpte-sized exec folios for 16K and 64K pages David Hildenbrand (Arm)
2026-03-13 19:59 ` Usama Arif
2026-03-16 16:06 ` David Hildenbrand (Arm)
2026-03-18 10:41 ` Usama Arif
2026-03-18 12:41 ` David Hildenbrand (Arm)
2026-03-13 16:33 ` Ryan Roberts
2026-03-13 20:55 ` Usama Arif
2026-03-18 10:52 ` Usama Arif
2026-03-19 7:40 ` David Hildenbrand (Arm)
2026-03-14 13:20 ` WANG Rui
2026-03-13 16:35 ` hev
2026-03-14 9:50 ` WANG Rui
2026-03-18 10:57 ` Usama Arif
2026-03-18 11:46 ` WANG Rui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260310145406.3073394-1-usama.arif@linux.dev \
--to=usama.arif@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=ajd@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=kas@kernel.org \
--cc=kees@kernel.org \
--cc=kernel-team@meta.com \
--cc=kevin.brodsky@arm.com \
--cc=lance.yang@linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=rmclure@linux.ibm.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox