Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Frederick Mayle <fmayle@google.com>, Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>, Jan Kara <jack@suse.cz>,
	Kalesh Singh <kaleshsingh@google.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH mm-hotfixses] Revert "mm: limit filemap_fault readahead to VMA boundaries"
Date: Wed, 24 Jun 2026 11:04:17 +0200	[thread overview]
Message-ID: <83df186a-c3e1-488d-8981-94192f36a2ea@kernel.org> (raw)
In-Reply-To: <CAHCxdc4tBb+U7LrrHH+sTEr5Mv_0hJKV6TMZNrrjb-p28sPBcw@mail.gmail.com>

On 6/23/26 21:28, Frederick Mayle wrote:
> On Mon, Jun 22, 2026 at 3:29 PM Pedro Falcato <pfalcato@suse.de> wrote:
>>
>> On Mon, Jun 22, 2026 at 10:57:30AM -0700, Suren Baghdasaryan wrote:
>>>
>>> Thanks for the suggestion! That sounds sensible to me.
>>
>> I don't think this works. Here's an example readelf -a from a random,
>> trivial ELF I have:
>>
>> pfalcato@pedro-suse:~/linux> cc -g main.c
>> pfalcato@pedro-suse:~/linux> readelf -a a.out
>> ELF Header:
>>   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
>>   Class:                             ELF64
>>   Data:                              2's complement, little endian
>>   Version:                           1 (current)
>>   OS/ABI:                            UNIX - System V
>>   ABI Version:                       0
>>   Type:                              EXEC (Executable file)
>>   Machine:                           Advanced Micro Devices X86-64
>>   Version:                           0x1
>>   Entry point address:               0x401040
>>   Start of program headers:          64 (bytes into file)
>>   Start of section headers:          18448 (bytes into file)
>>   Flags:                             0x0
>>   Size of this header:               64 (bytes)
>>   Size of program headers:           56 (bytes)
>>   Number of program headers:         14
>>   Size of section headers:           64 (bytes)
>>   Number of section headers:         38
>>   Section header string table index: 37
>>
>> Section Headers:
>>   [Nr] Name              Type             Address           Offset
>>        Size              EntSize          Flags  Link  Info  Align
>>   [ 0]                   NULL             0000000000000000  00000000
>>        0000000000000000  0000000000000000           0     0     0
>>   [ 1] .note.gnu.pr[...] NOTE             0000000000400350  00000350
>>        0000000000000040  0000000000000000   A       0     0     8
>>   [ 2] .note.gnu.bu[...] NOTE             0000000000400390  00000390
>>        0000000000000024  0000000000000000   A       0     0     4
>>   [ 3] .interp           PROGBITS         00000000004003b4  000003b4
>>        000000000000001c  0000000000000000   A       0     0     1
>>   [ 4] .hash             HASH             00000000004003d0  000003d0
>>        0000000000000024  0000000000000004   A       6     0     8
>>   [ 5] .gnu.hash         GNU_HASH         00000000004003f8  000003f8
>>        000000000000001c  0000000000000000   A       6     0     8
>>   [ 6] .dynsym           DYNSYM           0000000000400418  00000418
>>        0000000000000060  0000000000000018   A       7     1     8
>>   [ 7] .dynstr           STRTAB           0000000000400478  00000478
>>        000000000000004a  0000000000000000   A       0     0     1
>>   [ 8] .gnu.version      VERSYM           00000000004004c2  000004c2
>>        0000000000000008  0000000000000002   A       6     0     2
>>   [ 9] .gnu.version_r    VERNEED          00000000004004d0  000004d0
>>        0000000000000030  0000000000000000   A       7     1     8
>>   [10] .rela.dyn         RELA             0000000000400500  00000500
>>        0000000000000030  0000000000000018   A       6     0     8
>>   [11] .rela.plt         RELA             0000000000400530  00000530
>>        0000000000000018  0000000000000018  AI       6    24     8
>>   [12] .init             PROGBITS         0000000000401000  00001000
>>        000000000000001b  0000000000000000  AX       0     0     4
>>   [13] .plt              PROGBITS         0000000000401020  00001020
>>        0000000000000020  0000000000000010  AX       0     0     16
>>   [14] .text             PROGBITS         0000000000401040  00001040
>>        000000000000011b  0000000000000000  AX       0     0     16
>>   [15] .fini             PROGBITS         000000000040115c  0000115c
>>        000000000000000d  0000000000000000  AX       0     0     4
>>   [16] .rodata           PROGBITS         0000000000402000  00002000
>>        0000000000000004  0000000000000004  AM       0     0     4
>>   [17] .eh_frame_hdr     PROGBITS         0000000000402004  00002004
>>        000000000000002c  0000000000000000   A       0     0     4
>>   [18] .eh_frame         PROGBITS         0000000000402030  00002030
>>        0000000000000088  0000000000000000   A       0     0     8
>>   [19] .note.ABI-tag     NOTE             00000000004020b8  000020b8
>>        0000000000000020  0000000000000000   A       0     0     4
>>   [20] .init_array       INIT_ARRAY       0000000000403de8  00002de8
>>        0000000000000008  0000000000000008  WA       0     0     8
>>   [21] .fini_array       FINI_ARRAY       0000000000403df0  00002df0
>>        0000000000000008  0000000000000008  WA       0     0     8
>>   [22] .dynamic          DYNAMIC          0000000000403df8  00002df8
>>        00000000000001e0  0000000000000010  WA       7     0     8
>>   [23] .got              PROGBITS         0000000000403fd8  00002fd8
>>        0000000000000010  0000000000000008  WA       0     0     8
>>   [24] .got.plt          PROGBITS         0000000000403fe8  00002fe8
>>        0000000000000020  0000000000000008  WA       0     0     8
>>   [25] .data             PROGBITS         0000000000404008  00003008
>>        0000000000000010  0000000000000000  WA       0     0     8
>>   [26] .bss              NOBITS           0000000000404018  00003018
>>        0000000000000008  0000000000000000  WA       0     0     1
>>   [27] .comment          PROGBITS         0000000000000000  00003018
>>        0000000000000019  0000000000000001  MS       0     0     1
>>   [28] .debug_aranges    PROGBITS         0000000000000000  00003040
>>        0000000000000150  0000000000000000           0     0     16
>>   [29] .debug_info       PROGBITS         0000000000000000  00003190
>>        0000000000000444  0000000000000000           0     0     1
>>   [30] .debug_abbrev     PROGBITS         0000000000000000  000035d4
>>        0000000000000245  0000000000000000           0     0     1
>>   [31] .debug_line       PROGBITS         0000000000000000  00003819
>>        0000000000000274  0000000000000000           0     0     1
>>   [32] .debug_str        PROGBITS         0000000000000000  00003a8d
>>        0000000000000540  0000000000000001  MS       0     0     1
>>   [33] .debug_line_str   PROGBITS         0000000000000000  00003fcd
>>        0000000000000163  0000000000000001  MS       0     0     1
>>   [34] .debug_rnglists   PROGBITS         0000000000000000  00004130
>>        0000000000000042  0000000000000000           0     0     1
>>   [35] .symtab           SYMTAB           0000000000000000  00004178
>>        0000000000000360  0000000000000018          36    20     8
>>   [36] .strtab           STRTAB           0000000000000000  000044d8
>>        00000000000001bc  0000000000000000           0     0     1
>>   [37] .shstrtab         STRTAB           0000000000000000  00004694
>>        0000000000000176  0000000000000000           0     0     1
>> Key to Flags:
>>   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
>>   L (link order), O (extra OS processing required), G (group), T (TLS),
>>   C (compressed), x (unknown), o (OS specific), E (exclude),
>>   D (mbind), l (large), p (processor specific)
>>
>> Notice the section header table, and how it starts after program text and
>> program data, and how all the other ELF gunk (debug info, symtab,
>> strtab(s)) also goes after .data. So (mostly) the real problematic readahead
>> would be on the RW VMA that covers .data.
>>
>> (This also matches my understanding of linkers, where they generally do
>> (to put it simply) ELF headers - program headers - .text - .data - .bss, with
>> stripable gunk after it.)
>>
>> It's also the case that synchronous RA on VM_EXEC is already pretty
>> conservative and limited, see the big if (vm_flags & VM_EXEC) in
>> do_sync_mmap_readahead(). (I think the underlying logic behind also
>> implies that async RA will not be started against these pages, but I
>> am not sure).
>>
>> --
>> Pedro
> 
> Yes, I think readahead of VM_EXEC is already restricted to the VMA.
> Maybe there is an edge case where someone does buffered reads on an
> ELF file, leaving a PG_readahead flag inside the VM_EXEC range, then
> it could trigger async readahead beyond the end, but that sounds
> minor.
> 
> For next steps: Suppose we show the mmap usage in this video encoder
> is significantly inefficient compared to buffered reads or a big mmap
> and that project accepts a contribution to move away from the small
> mmaps. Would we be comfortable attempting this again as is? Probably
> there would be a lag before all users of the encoder update and they
> may see this bad perf.

How could you be sure that there isn't some other proprietary (or even just
another open-source) software out there that relies on similar things, but
doesn't provide easy benchmarks that can easily catch this?

-- 
Cheers,

David


      reply	other threads:[~2026-06-24  9:04 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-19 11:28 [PATCH mm-hotfixses] Revert "mm: limit filemap_fault readahead to VMA boundaries" Lorenzo Stoakes
2026-06-19 11:40 ` David Hildenbrand (Arm)
2026-06-19 11:58 ` Pedro Falcato
2026-06-19 15:47 ` Matthew Wilcox
2026-06-19 16:37 ` Andrew Morton
2026-06-19 16:52   ` Matthew Wilcox
2026-06-19 17:07     ` Lorenzo Stoakes
2026-06-19 17:08       ` Lorenzo Stoakes
2026-06-19 17:18       ` Pedro Falcato
2026-06-19 17:43         ` Suren Baghdasaryan
2026-06-19 17:46           ` Matthew Wilcox
2026-06-19 17:52             ` Suren Baghdasaryan
2026-06-19 19:26               ` Pedro Falcato
2026-06-19 19:33                 ` Suren Baghdasaryan
2026-06-22  9:26                   ` Jan Kara
2026-06-22 13:55                     ` Lorenzo Stoakes
2026-06-22 16:58                       ` Andrew Morton
2026-06-22 17:11                         ` Matthew Wilcox
2026-06-22 17:57                           ` Suren Baghdasaryan
2026-06-22 19:29                             ` Pedro Falcato
2026-06-23 19:28                               ` Frederick Mayle
2026-06-24  9:04                                 ` David Hildenbrand (Arm) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83df186a-c3e1-488d-8981-94192f36a2ea@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=fmayle@google.com \
    --cc=jack@suse.cz \
    --cc=kaleshsingh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=pfalcato@suse.de \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox