* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
[not found] ` <20260320140315.979307-4-usama.arif@linux.dev>
@ 2026-03-20 14:55 ` Kiryl Shutsemau
2026-03-20 15:58 ` Matthew Wilcox
2026-03-20 16:05 ` WANG Rui
2 siblings, 0 replies; 16+ messages in thread
From: Kiryl Shutsemau @ 2026-03-20 14:55 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, david, willy, ryan.roberts, linux-mm, r, jack, ajd,
apopple, baohua, baolin.wang, brauner, catalin.marinas, dev.jain,
kees, kevin.brodsky, lance.yang, Liam.Howlett, linux-arm-kernel,
linux-fsdevel, linux-kernel, lorenzo.stoakes, mhocko, npache,
pasha.tatashin, rmclure, rppt, surenb, vbabka, Al Viro, wilts,
ziy, hannes, shakeel.butt, kernel-team
On Fri, Mar 20, 2026 at 06:58:53AM -0700, Usama Arif wrote:
> For PIE binaries (ET_DYN), the load address is randomized at PAGE_SIZE
> granularity via arch_mmap_rnd(). On arm64 with 64K base pages, this
> means the binary is 64K-aligned, but contpte mapping requires 2M
> (CONT_PTE_SIZE) alignment.
>
> Without proper virtual address alignment, readahead patches that
> allocate 2M folios with 2M-aligned file offsets and physical addresses
> cannot benefit from contpte mapping, as the contpte fold check in
> contpte_set_ptes() requires the virtual address to be CONT_PTE_SIZE-
> aligned.
>
> Fix this by extending maximum_alignment() to consider the maximum folio
> size supported by the page cache (via mapping_max_folio_size()). For
> each PT_LOAD segment, the alignment is bumped to the largest
> power-of-2 that fits within the segment size, capped by the max folio
> size the filesystem will allocate, if:
>
> - Both p_vaddr and p_offset are aligned to that size
> - The segment is large enough (p_filesz >= size)
>
> This ensures load_bias is folio-aligned so that file-offset-aligned
> folios map to properly aligned virtual addresses, enabling hardware PTE
> coalescing (e.g. arm64 contpte) and PMD mappings for large folios.
>
> The segment size check avoids reducing ASLR entropy for small binaries
> that cannot benefit from large folio alignment.
>
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
[not found] ` <20260320140315.979307-4-usama.arif@linux.dev>
2026-03-20 14:55 ` [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing Kiryl Shutsemau
@ 2026-03-20 15:58 ` Matthew Wilcox
2026-03-27 16:51 ` Usama Arif
2026-03-20 16:05 ` WANG Rui
2 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2026-03-20 15:58 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, david, ryan.roberts, linux-mm, r, jack, ajd,
apopple, baohua, baolin.wang, brauner, catalin.marinas, dev.jain,
kees, kevin.brodsky, lance.yang, Liam.Howlett, linux-arm-kernel,
linux-fsdevel, linux-kernel, lorenzo.stoakes, mhocko, npache,
pasha.tatashin, rmclure, rppt, surenb, vbabka, Al Viro,
wilts.infradead.org, ziy, hannes, kas, shakeel.butt, kernel-team
On Fri, Mar 20, 2026 at 06:58:53AM -0700, Usama Arif wrote:
> -static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
> +static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr,
> + struct file *filp)
> {
> unsigned long alignment = 0;
> + unsigned long max_folio_size = PAGE_SIZE;
> int i;
>
> + if (filp && filp->f_mapping)
> + max_folio_size = mapping_max_folio_size(filp->f_mapping);
Under what circumstances can bprm->file be NULL?
Also we tend to prefer the name "file" rather than "filp" for new code
(yes, there's a lot of old code out there).
> +
> + /*
> + * Try to align the binary to the largest folio
> + * size that the page cache supports, so the
> + * hardware can coalesce PTEs (e.g. arm64
> + * contpte) or use PMD mappings for large folios.
> + *
> + * Use the largest power-of-2 that fits within
> + * the segment size, capped by what the page
> + * cache will allocate. Only align when the
> + * segment's virtual address and file offset are
> + * already aligned to the folio size, as
> + * misalignment would prevent coalescing anyway.
> + *
> + * The segment size check avoids reducing ASLR
> + * entropy for small binaries that cannot
> + * benefit.
> + */
> + if (!cmds[i].p_filesz)
> + continue;
> + size = rounddown_pow_of_two(cmds[i].p_filesz);
> + size = min(size, max_folio_size);
> + if (size > PAGE_SIZE &&
> + IS_ALIGNED(cmds[i].p_vaddr, size) &&
> + IS_ALIGNED(cmds[i].p_offset, size))
> + alignment = max(alignment, size);
Can this not all be factored out into a different function? Also, I
think it was done a bit better here:
https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc/
+ if (!IS_ALIGNED(cmd->p_vaddr | cmd->p_offset, PMD_SIZE))
+ return false;
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
2026-03-20 15:58 ` Matthew Wilcox
@ 2026-03-27 16:51 ` Usama Arif
0 siblings, 0 replies; 16+ messages in thread
From: Usama Arif @ 2026-03-27 16:51 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, david, ryan.roberts, linux-mm, r, jack, ajd,
apopple, baohua, baolin.wang, brauner, catalin.marinas, dev.jain,
kees, kevin.brodsky, lance.yang, Liam.Howlett, linux-arm-kernel,
linux-fsdevel, linux-kernel, lorenzo.stoakes, mhocko, npache,
pasha.tatashin, rmclure, rppt, surenb, vbabka, Al Viro,
wilts.infradead.org, ziy, hannes, kas, shakeel.butt, kernel-team
On 20/03/2026 18:58, Matthew Wilcox wrote:
> On Fri, Mar 20, 2026 at 06:58:53AM -0700, Usama Arif wrote:
>> -static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
>> +static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr,
>> + struct file *filp)
>> {
>> unsigned long alignment = 0;
>> + unsigned long max_folio_size = PAGE_SIZE;
>> int i;
>>
>> + if (filp && filp->f_mapping)
>> + max_folio_size = mapping_max_folio_size(filp->f_mapping);
>
> Under what circumstances can bprm->file be NULL?
Yeah its unnecessary here. Its used in other places and this is never
checked, so we can remove it.
>
> Also we tend to prefer the name "file" rather than "filp" for new code
> (yes, there's a lot of old code out there).
>
ack. will change in next revision.
>> +
>> + /*
>> + * Try to align the binary to the largest folio
>> + * size that the page cache supports, so the
>> + * hardware can coalesce PTEs (e.g. arm64
>> + * contpte) or use PMD mappings for large folios.
>> + *
>> + * Use the largest power-of-2 that fits within
>> + * the segment size, capped by what the page
>> + * cache will allocate. Only align when the
>> + * segment's virtual address and file offset are
>> + * already aligned to the folio size, as
>> + * misalignment would prevent coalescing anyway.
>> + *
>> + * The segment size check avoids reducing ASLR
>> + * entropy for small binaries that cannot
>> + * benefit.
>> + */
>> + if (!cmds[i].p_filesz)
>> + continue;
>> + size = rounddown_pow_of_two(cmds[i].p_filesz);
>> + size = min(size, max_folio_size);
>> + if (size > PAGE_SIZE &&
>> + IS_ALIGNED(cmds[i].p_vaddr, size) &&
>> + IS_ALIGNED(cmds[i].p_offset, size))
>> + alignment = max(alignment, size);
>
> Can this not all be factored out into a different function? Also, I
> think it was done a bit better here:
> https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc/
>
> + if (!IS_ALIGNED(cmd->p_vaddr | cmd->p_offset, PMD_SIZE))
> + return false;
>
ack, will try and address this accordingly.
Thanks for the reviews!!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
[not found] ` <20260320140315.979307-4-usama.arif@linux.dev>
2026-03-20 14:55 ` [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing Kiryl Shutsemau
2026-03-20 15:58 ` Matthew Wilcox
@ 2026-03-20 16:05 ` WANG Rui
2026-03-20 17:47 ` Matthew Wilcox
2026-03-27 16:53 ` Usama Arif
2 siblings, 2 replies; 16+ messages in thread
From: WANG Rui @ 2026-03-20 16:05 UTC (permalink / raw)
To: usama.arif
Cc: Liam.Howlett, ajd, akpm, apopple, baohua, baolin.wang, brauner,
catalin.marinas, david, dev.jain, jack, kees, kevin.brodsky,
lance.yang, linux-arm-kernel, linux-fsdevel, linux-fsdevel,
linux-kernel, linux-mm, lorenzo.stoakes, mhocko, npache,
pasha.tatashin, r, rmclure, rppt, ryan.roberts, surenb, vbabka,
viro, willy
Hi Usama,
On Fri, Mar 20, 2026 at 10:04 PM Usama Arif <usama.arif@linux.dev> wrote:
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 8e89cc5b28200..042af81766fcd 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -49,6 +49,7 @@
> #include <uapi/linux/rseq.h>
> #include <asm/param.h>
> #include <asm/page.h>
> +#include <linux/pagemap.h>
>
> #ifndef ELF_COMPAT
> #define ELF_COMPAT 0
> @@ -488,19 +489,51 @@ static int elf_read(struct file *file, void *buf, size_t len, loff_t pos)
> return 0;
> }
>
> -static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
> +static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr,
> + struct file *filp)
> {
> unsigned long alignment = 0;
> + unsigned long max_folio_size = PAGE_SIZE;
> int i;
>
> + if (filp && filp->f_mapping)
> + max_folio_size = mapping_max_folio_size(filp->f_mapping);
From experiments (with 16K base pages), mapping_max_folio_size() appears to
depend on the filesystem. It returns 8M on ext4, while on btrfs it always
falls back to PAGE_SIZE (it seems CONFIG_BTRFS_EXPERIMENTAL=y may change this).
This looks overly conservative and ends up missing practical optimization
opportunities.
> +
> for (i = 0; i < nr; i++) {
> if (cmds[i].p_type == PT_LOAD) {
> unsigned long p_align = cmds[i].p_align;
> + unsigned long size;
>
> /* skip non-power of two alignments as invalid */
> if (!is_power_of_2(p_align))
> continue;
> alignment = max(alignment, p_align);
> +
> + /*
> + * Try to align the binary to the largest folio
> + * size that the page cache supports, so the
> + * hardware can coalesce PTEs (e.g. arm64
> + * contpte) or use PMD mappings for large folios.
> + *
> + * Use the largest power-of-2 that fits within
> + * the segment size, capped by what the page
> + * cache will allocate. Only align when the
> + * segment's virtual address and file offset are
> + * already aligned to the folio size, as
> + * misalignment would prevent coalescing anyway.
> + *
> + * The segment size check avoids reducing ASLR
> + * entropy for small binaries that cannot
> + * benefit.
> + */
> + if (!cmds[i].p_filesz)
> + continue;
> + size = rounddown_pow_of_two(cmds[i].p_filesz);
> + size = min(size, max_folio_size);
> + if (size > PAGE_SIZE &&
> + IS_ALIGNED(cmds[i].p_vaddr, size) &&
> + IS_ALIGNED(cmds[i].p_offset, size))
> + alignment = max(alignment, size);
In my patch [1], by aligning eligible segments to PMD_SIZE, THP can quickly
collapse them into large mappings with minimal warmup. That doesn’t happen
with the current behavior. I think allowing a reasonably sized PMD (say <= 32M)
is worth considering. All we really need here is to ensure virtual address
alignment. The rest can be left to THP under always, which can decide whether
to collapse or not based on memory pressure and other factors.
[1] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc
> }
> }
>
> @@ -1104,7 +1137,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
> }
>
> /* Calculate any requested alignment. */
> - alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum);
> + alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum,
> + bprm->file);
>
> /**
> * DOC: PIE handling
> --
> 2.52.0
>
Thanks,
Rui
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
2026-03-20 16:05 ` WANG Rui
@ 2026-03-20 17:47 ` Matthew Wilcox
2026-03-27 16:53 ` Usama Arif
1 sibling, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2026-03-20 17:47 UTC (permalink / raw)
To: WANG Rui
Cc: usama.arif, Liam.Howlett, ajd, akpm, apopple, baohua, baolin.wang,
brauner, catalin.marinas, david, dev.jain, jack, kees,
kevin.brodsky, lance.yang, linux-arm-kernel, linux-fsdevel,
linux-fsdevel, linux-kernel, linux-mm, lorenzo.stoakes, mhocko,
npache, pasha.tatashin, rmclure, rppt, ryan.roberts, surenb,
vbabka, viro
On Sat, Mar 21, 2026 at 12:05:18AM +0800, WANG Rui wrote:
> >From experiments (with 16K base pages), mapping_max_folio_size() appears to
> depend on the filesystem. It returns 8M on ext4, while on btrfs it always
> falls back to PAGE_SIZE (it seems CONFIG_BTRFS_EXPERIMENTAL=y may change this).
> This looks overly conservative and ends up missing practical optimization
> opportunities.
btrfs only supports large folios with CONFIG_BTRFS_EXPERIMENTAL.
I mean, it's only been five years since it was added to XFS, can't rush
these things.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
2026-03-20 16:05 ` WANG Rui
2026-03-20 17:47 ` Matthew Wilcox
@ 2026-03-27 16:53 ` Usama Arif
2026-03-29 4:37 ` WANG Rui
1 sibling, 1 reply; 16+ messages in thread
From: Usama Arif @ 2026-03-27 16:53 UTC (permalink / raw)
To: WANG Rui
Cc: Liam.Howlett, ajd, akpm, apopple, baohua, baolin.wang, brauner,
catalin.marinas, david, dev.jain, jack, kees, kevin.brodsky,
lance.yang, linux-arm-kernel, linux-fsdevel, linux-fsdevel,
linux-kernel, linux-mm, lorenzo.stoakes, mhocko, npache,
pasha.tatashin, rmclure, rppt, ryan.roberts, surenb, vbabka, viro,
willy
On 20/03/2026 19:05, WANG Rui wrote:
> Hi Usama,
>
> On Fri, Mar 20, 2026 at 10:04 PM Usama Arif <usama.arif@linux.dev> wrote:
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index 8e89cc5b28200..042af81766fcd 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
>> @@ -49,6 +49,7 @@
>> #include <uapi/linux/rseq.h>
>> #include <asm/param.h>
>> #include <asm/page.h>
>> +#include <linux/pagemap.h>
>>
>> #ifndef ELF_COMPAT
>> #define ELF_COMPAT 0
>> @@ -488,19 +489,51 @@ static int elf_read(struct file *file, void *buf, size_t len, loff_t pos)
>> return 0;
>> }
>>
>> -static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr)
>> +static unsigned long maximum_alignment(struct elf_phdr *cmds, int nr,
>> + struct file *filp)
>> {
>> unsigned long alignment = 0;
>> + unsigned long max_folio_size = PAGE_SIZE;
>> int i;
>>
>> + if (filp && filp->f_mapping)
>> + max_folio_size = mapping_max_folio_size(filp->f_mapping);
>
> From experiments (with 16K base pages), mapping_max_folio_size() appears to
> depend on the filesystem. It returns 8M on ext4, while on btrfs it always
> falls back to PAGE_SIZE (it seems CONFIG_BTRFS_EXPERIMENTAL=y may change this).
> This looks overly conservative and ends up missing practical optimization
> opportunities.
mapping_max_folio_size() reflects what the page cache will actually
allocate for a given filesystem, since readahead caps folio allocation
at mapping_max_folio_order() (in page_cache_ra_order()). If btrfs
reports PAGE_SIZE, readahead won't allocate large folios for it, so
there are no large folios to coalesce PTEs for, aligning the binary
beyond that would only reduce ASLR entropy for no benefit.
I don't think we should over-align binaries on filesystems that can't
take advantage of it.
>
>> +
>> for (i = 0; i < nr; i++) {
>> if (cmds[i].p_type == PT_LOAD) {
>> unsigned long p_align = cmds[i].p_align;
>> + unsigned long size;
>>
>> /* skip non-power of two alignments as invalid */
>> if (!is_power_of_2(p_align))
>> continue;
>> alignment = max(alignment, p_align);
>> +
>> + /*
>> + * Try to align the binary to the largest folio
>> + * size that the page cache supports, so the
>> + * hardware can coalesce PTEs (e.g. arm64
>> + * contpte) or use PMD mappings for large folios.
>> + *
>> + * Use the largest power-of-2 that fits within
>> + * the segment size, capped by what the page
>> + * cache will allocate. Only align when the
>> + * segment's virtual address and file offset are
>> + * already aligned to the folio size, as
>> + * misalignment would prevent coalescing anyway.
>> + *
>> + * The segment size check avoids reducing ASLR
>> + * entropy for small binaries that cannot
>> + * benefit.
>> + */
>> + if (!cmds[i].p_filesz)
>> + continue;
>> + size = rounddown_pow_of_two(cmds[i].p_filesz);
>> + size = min(size, max_folio_size);
>> + if (size > PAGE_SIZE &&
>> + IS_ALIGNED(cmds[i].p_vaddr, size) &&
>> + IS_ALIGNED(cmds[i].p_offset, size))
>> + alignment = max(alignment, size);
>
> In my patch [1], by aligning eligible segments to PMD_SIZE, THP can quickly
> collapse them into large mappings with minimal warmup. That doesn’t happen
> with the current behavior. I think allowing a reasonably sized PMD (say <= 32M)
> is worth considering. All we really need here is to ensure virtual address
> alignment. The rest can be left to THP under always, which can decide whether
> to collapse or not based on memory pressure and other factors.
>
> [1] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc
>
>> }
>> }
>>
>> @@ -1104,7 +1137,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
>> }
>>
>> /* Calculate any requested alignment. */
>> - alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum);
>> + alignment = maximum_alignment(elf_phdata, elf_ex->e_phnum,
>> + bprm->file);
>>
>> /**
>> * DOC: PIE handling
>> --
>> 2.52.0
>>
>
> Thanks,
> Rui
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
2026-03-27 16:53 ` Usama Arif
@ 2026-03-29 4:37 ` WANG Rui
2026-03-30 12:56 ` Matthew Wilcox
0 siblings, 1 reply; 16+ messages in thread
From: WANG Rui @ 2026-03-29 4:37 UTC (permalink / raw)
To: usama.arif
Cc: Liam.Howlett, ajd, akpm, apopple, baohua, baolin.wang, brauner,
catalin.marinas, david, dev.jain, jack, kees, kevin.brodsky,
lance.yang, linux-arm-kernel, linux-fsdevel, linux-kernel,
linux-mm, lorenzo.stoakes, mhocko, npache, pasha.tatashin, r,
rmclure, rppt, ryan.roberts, surenb, vbabka, viro, willy
> mapping_max_folio_size() reflects what the page cache will actually
> allocate for a given filesystem, since readahead caps folio allocation
> at mapping_max_folio_order() (in page_cache_ra_order()). If btrfs
> reports PAGE_SIZE, readahead won't allocate large folios for it, so
> there are no large folios to coalesce PTEs for, aligning the binary
> beyond that would only reduce ASLR entropy for no benefit.
>
> I don't think we should over-align binaries on filesystems that can't
> take advantage of it.
Ah, it looks like this might be overlooking another path that can create
huge page mappings for read-only code segments: even when the filesystem
(e.g. btrfs without experimental) didn't support large folios,
READ_ONLY_THP_FOR_FS still allowed read-only file-backed code segments
to be collapsed into huge page mappings via khugepaged.
As Wilcox pointed out, it may take quite some time for many filesystems
to gain full large folio support? So what I'm trying to clarify is that
using mapping_max_folio_size() on this path is not favorable for
khugepaged-based optimizations.
Thanks,
Rui
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
2026-03-29 4:37 ` WANG Rui
@ 2026-03-30 12:56 ` Matthew Wilcox
2026-03-30 14:00 ` Usama Arif
0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2026-03-30 12:56 UTC (permalink / raw)
To: WANG Rui
Cc: usama.arif, Liam.Howlett, ajd, akpm, apopple, baohua, baolin.wang,
brauner, catalin.marinas, david, dev.jain, jack, kees,
kevin.brodsky, lance.yang, linux-arm-kernel, linux-fsdevel,
linux-kernel, linux-mm, lorenzo.stoakes, mhocko, npache,
pasha.tatashin, rmclure, rppt, ryan.roberts, surenb, vbabka, viro
On Sun, Mar 29, 2026 at 12:37:00PM +0800, WANG Rui wrote:
> > mapping_max_folio_size() reflects what the page cache will actually
> > allocate for a given filesystem, since readahead caps folio allocation
> > at mapping_max_folio_order() (in page_cache_ra_order()). If btrfs
> > reports PAGE_SIZE, readahead won't allocate large folios for it, so
> > there are no large folios to coalesce PTEs for, aligning the binary
> > beyond that would only reduce ASLR entropy for no benefit.
> >
> > I don't think we should over-align binaries on filesystems that can't
> > take advantage of it.
>
> Ah, it looks like this might be overlooking another path that can create
> huge page mappings for read-only code segments: even when the filesystem
> (e.g. btrfs without experimental) didn't support large folios,
> READ_ONLY_THP_FOR_FS still allowed read-only file-backed code segments
> to be collapsed into huge page mappings via khugepaged.
>
> As Wilcox pointed out, it may take quite some time for many filesystems
> to gain full large folio support? So what I'm trying to clarify is that
> using mapping_max_folio_size() on this path is not favorable for
> khugepaged-based optimizations.
Nono, that's not what I'm pointing out! btrfs is simply not putting
in the effort to support large folios, and that needs to change.
READ_ONLY_THP_FOR_FS unnecessaily burdens the rest of the kernel.
It was a great hack for its time and paved the path for a lot of what
we have today, but it's time to remove it.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing
2026-03-30 12:56 ` Matthew Wilcox
@ 2026-03-30 14:00 ` Usama Arif
0 siblings, 0 replies; 16+ messages in thread
From: Usama Arif @ 2026-03-30 14:00 UTC (permalink / raw)
To: Matthew Wilcox, WANG Rui
Cc: Liam.Howlett, ajd, akpm, apopple, baohua, baolin.wang, brauner,
catalin.marinas, david, dev.jain, jack, kees, kevin.brodsky,
lance.yang, linux-arm-kernel, linux-fsdevel, linux-kernel,
linux-mm, lorenzo.stoakes, mhocko, npache, pasha.tatashin,
rmclure, rppt, ryan.roberts, surenb, vbabka, viro
On 30/03/2026 15:56, Matthew Wilcox wrote:
> On Sun, Mar 29, 2026 at 12:37:00PM +0800, WANG Rui wrote:
>>> mapping_max_folio_size() reflects what the page cache will actually
>>> allocate for a given filesystem, since readahead caps folio allocation
>>> at mapping_max_folio_order() (in page_cache_ra_order()). If btrfs
>>> reports PAGE_SIZE, readahead won't allocate large folios for it, so
>>> there are no large folios to coalesce PTEs for, aligning the binary
>>> beyond that would only reduce ASLR entropy for no benefit.
>>>
>>> I don't think we should over-align binaries on filesystems that can't
>>> take advantage of it.
>>
>> Ah, it looks like this might be overlooking another path that can create
>> huge page mappings for read-only code segments: even when the filesystem
>> (e.g. btrfs without experimental) didn't support large folios,
>> READ_ONLY_THP_FOR_FS still allowed read-only file-backed code segments
>> to be collapsed into huge page mappings via khugepaged.
ah yes, Thank you for pointing this out!
Maybe we should rename mapping_max_folio_size() to mapping_fault_max_folio_size().
>>
>> As Wilcox pointed out, it may take quite some time for many filesystems
>> to gain full large folio support? So what I'm trying to clarify is that
>> using mapping_max_folio_size() on this path is not favorable for
>> khugepaged-based optimizations.
ack
I am worried that 32M is too large and we lose out on a lot of ASLR bits.
Instead of PMD_ORDER, should we do max(SZ_2M, PMD_ORDER)?
> Nono, that's not what I'm pointing out! btrfs is simply not putting
> in the effort to support large folios, and that needs to change.
> READ_ONLY_THP_FOR_FS unnecessaily burdens the rest of the kernel.
> It was a great hack for its time and paved the path for a lot of what
> we have today, but it's time to remove it.
^ permalink raw reply [flat|nested] 16+ messages in thread