From: WANG Rui <r@hev.cc>
To: usama.arif@linux.dev
Cc: Liam.Howlett@oracle.com, ajd@linux.ibm.com,
akpm@linux-foundation.org, apopple@nvidia.com, baohua@kernel.org,
baolin.wang@linux.alibaba.com, brauner@kernel.org,
catalin.marinas@arm.com, david@kernel.org, dev.jain@arm.com,
jack@suse.cz, kees@kernel.org, kevin.brodsky@arm.com,
lance.yang@linux.dev, linux-arm-kernel@lists.infradead.org,
linux-fsdevel@vger.kernel.l, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
lorenzo.stoakes@oracle.com, mhocko@suse.com, npache@redhat.com,
pasha.tatashin@soleen.com, r@hev.cc, rmclure@linux.ibm.com,
rppt@kernel.org, ryan.roberts@arm.com, surenb@google.com,
vbabka@kernel.org, viro@zeniv.linux.org.uk, willy@infradead.org
Subject: Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
Date: Sat, 21 Mar 2026 00:05:18 +0800 [thread overview]
Message-ID: <20260320160519.80962-1-r@hev.cc> (raw)
In-Reply-To: <20260320140315.979307-4-usama.arif@linux.dev>
Hi Usama,
On Fri, Mar 20, 2026 at 10:04 PM Usama Arif <usama.arif@linux.dev> wrote:
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 8e89cc5b28200..042af81766fcd 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -49,6 +49,7 @@
> #include <uapi/linux/rseq.h>
> #include <asm/param.h>
> #include <asm/page.h>
> +#include <linux/pagemap.h>
>
> #ifndef ELF_COMPAT
> #define ELF_COMPAT 0
> @@ -488,19 +489,51 @@ static int elf_read(struct file *file, void *buf, size_t len, loff_t pos)
> return 0;
> }
>
> -static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
> +static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr,
> + struct file *filp)
> {
> unsigned long alignment = 0;
> + unsigned long max_folio_size = PAGE_SIZE;
> int i;
>
> + if (filp && filp->f_mapping)
> + max_folio_size = mapping_max_folio_size(filp->f_mapping);
From experiments (with 16K base pages), mapping_max_folio_size() appears to
depend on the filesystem. It returns 8M on ext4, while on btrfs it always
falls back to PAGE_SIZE (it seems CONFIG_BTRFS_EXPERIMENTAL=y may change this).
This looks overly conservative and ends up missing practical optimization
opportunities.
> +
> for (i = 0; i < nr; i++) {
> if (cmds[i].p_type == PT_LOAD) {
> unsigned long p_align = cmds[i].p_align;
> + unsigned long size;
>
> /* skip non-power of two alignments as invalid */
> if (!is_power_of_2(p_align))
> continue;
> alignment = max(alignment, p_align);
> +
> + /*
> + * Try to align the binary to the largest folio
> + * size that the page cache supports, so the
> + * hardware can coalesce PTEs (e.g. arm64
> + * contpte) or use PMD mappings for large folios.
> + *
> + * Use the largest power-of-2 that fits within
> + * the segment size, capped by what the page
> + * cache will allocate. Only align when the
> + * segment's virtual address and file offset are
> + * already aligned to the folio size, as
> + * misalignment would prevent coalescing anyway.
> + *
> + * The segment size check avoids reducing ASLR
> + * entropy for small binaries that cannot
> + * benefit.
> + */
> + if (!cmds[i].p_filesz)
> + continue;
> + size = rounddown_pow_of_two(cmds[i].p_filesz);
> + size = min(size, max_folio_size);
> + if (size > PAGE_SIZE &&
> + IS_ALIGNED(cmds[i].p_vaddr, size) &&
> + IS_ALIGNED(cmds[i].p_offset, size))
> + alignment = max(alignment, size);
In my patch [1], by aligning eligible segments to PMD_SIZE, THP can quickly
collapse them into large mappings with minimal warmup. That doesn’t happen
with the current behavior. I think allowing a reasonably sized PMD (say <= 32M)
is worth considering. All we really need here is to ensure virtual address
alignment. The rest can be left to THP under always, which can decide whether
to collapse or not based on memory pressure and other factors.
[1] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc
> }
> }
>
> @@ -1104,7 +1137,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
> }
>
> /* Calculate any requested alignment. */
> - alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum);
> + alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum,
> + bprm->file);
>
> /**
> * DOC: PIE handling
> --
> 2.52.0
>
Thanks,
Rui
next prev parent reply other threads:[~2026-03-20 16:06 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 13:58 [PATCH v2 0/4] mm: improve large folio readahead and alignment for exec memory Usama Arif
2026-03-20 13:58 ` [PATCH v2 1/4] mm: bypass mmap_miss heuristic for VM_EXEC readahead Usama Arif
2026-03-20 14:18 ` Jan Kara
2026-03-20 14:26 ` Kiryl Shutsemau
2026-03-20 13:58 ` [PATCH v2 2/4] mm: replace exec_folio_order() with generic preferred_exec_order() Usama Arif
2026-03-20 14:41 ` Kiryl Shutsemau
2026-03-20 14:42 ` Jan Kara
2026-03-26 12:40 ` Usama Arif
2026-03-20 13:58 ` [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing Usama Arif
2026-03-20 14:55 ` Kiryl Shutsemau
2026-03-20 15:58 ` Matthew Wilcox
2026-03-20 16:05 ` WANG Rui [this message]
2026-03-20 17:47 ` Matthew Wilcox
2026-03-20 13:58 ` [PATCH v2 4/4] mm: align file-backed mmap to max folio order in thp_get_unmapped_area Usama Arif
2026-03-20 15:06 ` Kiryl Shutsemau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260320160519.80962-1-r@hev.cc \
--to=r@hev.cc \
--cc=Liam.Howlett@oracle.com \
--cc=ajd@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=jack@suse.cz \
--cc=kees@kernel.org \
--cc=kevin.brodsky@arm.com \
--cc=lance.yang@linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.l \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=rmclure@linux.ibm.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox