From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Frederick Mayle <fmayle@google.com>, Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <ljs@kernel.org>, Jan Kara <jack@suse.cz>,
Kalesh Singh <kaleshsingh@google.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH mm-hotfixses] Revert "mm: limit filemap_fault readahead to VMA boundaries"
Date: Wed, 24 Jun 2026 11:04:17 +0200 [thread overview]
Message-ID: <83df186a-c3e1-488d-8981-94192f36a2ea@kernel.org> (raw)
In-Reply-To: <CAHCxdc4tBb+U7LrrHH+sTEr5Mv_0hJKV6TMZNrrjb-p28sPBcw@mail.gmail.com>
On 6/23/26 21:28, Frederick Mayle wrote:
> On Mon, Jun 22, 2026 at 3:29 PM Pedro Falcato <pfalcato@suse.de> wrote:
>>
>> On Mon, Jun 22, 2026 at 10:57:30AM -0700, Suren Baghdasaryan wrote:
>>>
>>> Thanks for the suggestion! That sounds sensible to me.
>>
>> I don't think this works. Here's an example readelf -a from a random,
>> trivial ELF I have:
>>
>> pfalcato@pedro-suse:~/linux> cc -g main.c
>> pfalcato@pedro-suse:~/linux> readelf -a a.out
>> ELF Header:
>> Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
>> Class: ELF64
>> Data: 2's complement, little endian
>> Version: 1 (current)
>> OS/ABI: UNIX - System V
>> ABI Version: 0
>> Type: EXEC (Executable file)
>> Machine: Advanced Micro Devices X86-64
>> Version: 0x1
>> Entry point address: 0x401040
>> Start of program headers: 64 (bytes into file)
>> Start of section headers: 18448 (bytes into file)
>> Flags: 0x0
>> Size of this header: 64 (bytes)
>> Size of program headers: 56 (bytes)
>> Number of program headers: 14
>> Size of section headers: 64 (bytes)
>> Number of section headers: 38
>> Section header string table index: 37
>>
>> Section Headers:
>> [Nr] Name Type Address Offset
>> Size EntSize Flags Link Info Align
>> [ 0] NULL 0000000000000000 00000000
>> 0000000000000000 0000000000000000 0 0 0
>> [ 1] .note.gnu.pr[...] NOTE 0000000000400350 00000350
>> 0000000000000040 0000000000000000 A 0 0 8
>> [ 2] .note.gnu.bu[...] NOTE 0000000000400390 00000390
>> 0000000000000024 0000000000000000 A 0 0 4
>> [ 3] .interp PROGBITS 00000000004003b4 000003b4
>> 000000000000001c 0000000000000000 A 0 0 1
>> [ 4] .hash HASH 00000000004003d0 000003d0
>> 0000000000000024 0000000000000004 A 6 0 8
>> [ 5] .gnu.hash GNU_HASH 00000000004003f8 000003f8
>> 000000000000001c 0000000000000000 A 6 0 8
>> [ 6] .dynsym DYNSYM 0000000000400418 00000418
>> 0000000000000060 0000000000000018 A 7 1 8
>> [ 7] .dynstr STRTAB 0000000000400478 00000478
>> 000000000000004a 0000000000000000 A 0 0 1
>> [ 8] .gnu.version VERSYM 00000000004004c2 000004c2
>> 0000000000000008 0000000000000002 A 6 0 2
>> [ 9] .gnu.version_r VERNEED 00000000004004d0 000004d0
>> 0000000000000030 0000000000000000 A 7 1 8
>> [10] .rela.dyn RELA 0000000000400500 00000500
>> 0000000000000030 0000000000000018 A 6 0 8
>> [11] .rela.plt RELA 0000000000400530 00000530
>> 0000000000000018 0000000000000018 AI 6 24 8
>> [12] .init PROGBITS 0000000000401000 00001000
>> 000000000000001b 0000000000000000 AX 0 0 4
>> [13] .plt PROGBITS 0000000000401020 00001020
>> 0000000000000020 0000000000000010 AX 0 0 16
>> [14] .text PROGBITS 0000000000401040 00001040
>> 000000000000011b 0000000000000000 AX 0 0 16
>> [15] .fini PROGBITS 000000000040115c 0000115c
>> 000000000000000d 0000000000000000 AX 0 0 4
>> [16] .rodata PROGBITS 0000000000402000 00002000
>> 0000000000000004 0000000000000004 AM 0 0 4
>> [17] .eh_frame_hdr PROGBITS 0000000000402004 00002004
>> 000000000000002c 0000000000000000 A 0 0 4
>> [18] .eh_frame PROGBITS 0000000000402030 00002030
>> 0000000000000088 0000000000000000 A 0 0 8
>> [19] .note.ABI-tag NOTE 00000000004020b8 000020b8
>> 0000000000000020 0000000000000000 A 0 0 4
>> [20] .init_array INIT_ARRAY 0000000000403de8 00002de8
>> 0000000000000008 0000000000000008 WA 0 0 8
>> [21] .fini_array FINI_ARRAY 0000000000403df0 00002df0
>> 0000000000000008 0000000000000008 WA 0 0 8
>> [22] .dynamic DYNAMIC 0000000000403df8 00002df8
>> 00000000000001e0 0000000000000010 WA 7 0 8
>> [23] .got PROGBITS 0000000000403fd8 00002fd8
>> 0000000000000010 0000000000000008 WA 0 0 8
>> [24] .got.plt PROGBITS 0000000000403fe8 00002fe8
>> 0000000000000020 0000000000000008 WA 0 0 8
>> [25] .data PROGBITS 0000000000404008 00003008
>> 0000000000000010 0000000000000000 WA 0 0 8
>> [26] .bss NOBITS 0000000000404018 00003018
>> 0000000000000008 0000000000000000 WA 0 0 1
>> [27] .comment PROGBITS 0000000000000000 00003018
>> 0000000000000019 0000000000000001 MS 0 0 1
>> [28] .debug_aranges PROGBITS 0000000000000000 00003040
>> 0000000000000150 0000000000000000 0 0 16
>> [29] .debug_info PROGBITS 0000000000000000 00003190
>> 0000000000000444 0000000000000000 0 0 1
>> [30] .debug_abbrev PROGBITS 0000000000000000 000035d4
>> 0000000000000245 0000000000000000 0 0 1
>> [31] .debug_line PROGBITS 0000000000000000 00003819
>> 0000000000000274 0000000000000000 0 0 1
>> [32] .debug_str PROGBITS 0000000000000000 00003a8d
>> 0000000000000540 0000000000000001 MS 0 0 1
>> [33] .debug_line_str PROGBITS 0000000000000000 00003fcd
>> 0000000000000163 0000000000000001 MS 0 0 1
>> [34] .debug_rnglists PROGBITS 0000000000000000 00004130
>> 0000000000000042 0000000000000000 0 0 1
>> [35] .symtab SYMTAB 0000000000000000 00004178
>> 0000000000000360 0000000000000018 36 20 8
>> [36] .strtab STRTAB 0000000000000000 000044d8
>> 00000000000001bc 0000000000000000 0 0 1
>> [37] .shstrtab STRTAB 0000000000000000 00004694
>> 0000000000000176 0000000000000000 0 0 1
>> Key to Flags:
>> W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
>> L (link order), O (extra OS processing required), G (group), T (TLS),
>> C (compressed), x (unknown), o (OS specific), E (exclude),
>> D (mbind), l (large), p (processor specific)
>>
>> Notice the section header table, and how it starts after program text and
>> program data, and how all the other ELF gunk (debug info, symtab,
>> strtab(s)) also goes after .data. So (mostly) the real problematic readahead
>> would be on the RW VMA that covers .data.
>>
>> (This also matches my understanding of linkers, where they generally do
>> (to put it simply) ELF headers - program headers - .text - .data - .bss, with
>> stripable gunk after it.)
>>
>> It's also the case that synchronous RA on VM_EXEC is already pretty
>> conservative and limited, see the big if (vm_flags & VM_EXEC) in
>> do_sync_mmap_readahead(). (I think the underlying logic behind also
>> implies that async RA will not be started against these pages, but I
>> am not sure).
>>
>> --
>> Pedro
>
> Yes, I think readahead of VM_EXEC is already restricted to the VMA.
> Maybe there is an edge case where someone does buffered reads on an
> ELF file, leaving a PG_readahead flag inside the VM_EXEC range, then
> it could trigger async readahead beyond the end, but that sounds
> minor.
>
> For next steps: Suppose we show the mmap usage in this video encoder
> is significantly inefficient compared to buffered reads or a big mmap
> and that project accepts a contribution to move away from the small
> mmaps. Would we be comfortable attempting this again as is? Probably
> there would be a lag before all users of the encoder update and they
> may see this bad perf.
How could you be sure that there isn't some other proprietary (or even just
another open-source) software out there that relies on similar things, but
doesn't provide easy benchmarks that can easily catch this?
--
Cheers,
David
prev parent reply other threads:[~2026-06-24 9:04 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-19 11:28 [PATCH mm-hotfixses] Revert "mm: limit filemap_fault readahead to VMA boundaries" Lorenzo Stoakes
2026-06-19 11:40 ` David Hildenbrand (Arm)
2026-06-19 11:58 ` Pedro Falcato
2026-06-19 15:47 ` Matthew Wilcox
2026-06-19 16:37 ` Andrew Morton
2026-06-19 16:52 ` Matthew Wilcox
2026-06-19 17:07 ` Lorenzo Stoakes
2026-06-19 17:08 ` Lorenzo Stoakes
2026-06-19 17:18 ` Pedro Falcato
2026-06-19 17:43 ` Suren Baghdasaryan
2026-06-19 17:46 ` Matthew Wilcox
2026-06-19 17:52 ` Suren Baghdasaryan
2026-06-19 19:26 ` Pedro Falcato
2026-06-19 19:33 ` Suren Baghdasaryan
2026-06-22 9:26 ` Jan Kara
2026-06-22 13:55 ` Lorenzo Stoakes
2026-06-22 16:58 ` Andrew Morton
2026-06-22 17:11 ` Matthew Wilcox
2026-06-22 17:57 ` Suren Baghdasaryan
2026-06-22 19:29 ` Pedro Falcato
2026-06-23 19:28 ` Frederick Mayle
2026-06-24 9:04 ` David Hildenbrand (Arm) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83df186a-c3e1-488d-8981-94192f36a2ea@kernel.org \
--to=david@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=fmayle@google.com \
--cc=jack@suse.cz \
--cc=kaleshsingh@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=pfalcato@suse.de \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox