All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Frederick Mayle <fmayle@google.com>, Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>, Jan Kara <jack@suse.cz>,
	Kalesh Singh <kaleshsingh@google.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH mm-hotfixses] Revert "mm: limit filemap_fault readahead to VMA boundaries"
Date: Wed, 24 Jun 2026 11:04:17 +0200	[thread overview]
Message-ID: <83df186a-c3e1-488d-8981-94192f36a2ea@kernel.org> (raw)
In-Reply-To: <CAHCxdc4tBb+U7LrrHH+sTEr5Mv_0hJKV6TMZNrrjb-p28sPBcw@mail.gmail.com>

On 6/23/26 21:28, Frederick Mayle wrote:
> On Mon, Jun 22, 2026 at 3:29 PM Pedro Falcato <pfalcato@suse.de> wrote:
>>
>> On Mon, Jun 22, 2026 at 10:57:30AM -0700, Suren Baghdasaryan wrote:
>>>
>>> Thanks for the suggestion! That sounds sensible to me.
>>
>> I don't think this works. Here's an example readelf -a from a random,
>> trivial ELF I have:
>>
>> pfalcato@pedro-suse:~/linux> cc -g main.c
>> pfalcato@pedro-suse:~/linux> readelf -a a.out
>> ELF Header:
>>   Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
>>   Class:                             ELF64
>>   Data:                              2's complement, little endian
>>   Version:                           1 (current)
>>   OS/ABI:                            UNIX - System V
>>   ABI Version:                       0
>>   Type:                              EXEC (Executable file)
>>   Machine:                           Advanced Micro Devices X86-64
>>   Version:                           0x1
>>   Entry point address:               0x401040
>>   Start of program headers:          64 (bytes into file)
>>   Start of section headers:          18448 (bytes into file)
>>   Flags:                             0x0
>>   Size of this header:               64 (bytes)
>>   Size of program headers:           56 (bytes)
>>   Number of program headers:         14
>>   Size of section headers:           64 (bytes)
>>   Number of section headers:         38
>>   Section header string table index: 37
>>
>> Section Headers:
>>   [Nr] Name              Type             Address           Offset
>>        Size              EntSize          Flags  Link  Info  Align
>>   [ 0]                   NULL             0000000000000000  00000000
>>        0000000000000000  0000000000000000           0     0     0
>>   [ 1] .note.gnu.pr[...] NOTE             0000000000400350  00000350
>>        0000000000000040  0000000000000000   A       0     0     8
>>   [ 2] .note.gnu.bu[...] NOTE             0000000000400390  00000390
>>        0000000000000024  0000000000000000   A       0     0     4
>>   [ 3] .interp           PROGBITS         00000000004003b4  000003b4
>>        000000000000001c  0000000000000000   A       0     0     1
>>   [ 4] .hash             HASH             00000000004003d0  000003d0
>>        0000000000000024  0000000000000004   A       6     0     8
>>   [ 5] .gnu.hash         GNU_HASH         00000000004003f8  000003f8
>>        000000000000001c  0000000000000000   A       6     0     8
>>   [ 6] .dynsym           DYNSYM           0000000000400418  00000418
>>        0000000000000060  0000000000000018   A       7     1     8
>>   [ 7] .dynstr           STRTAB           0000000000400478  00000478
>>        000000000000004a  0000000000000000   A       0     0     1
>>   [ 8] .gnu.version      VERSYM           00000000004004c2  000004c2
>>        0000000000000008  0000000000000002   A       6     0     2
>>   [ 9] .gnu.version_r    VERNEED          00000000004004d0  000004d0
>>        0000000000000030  0000000000000000   A       7     1     8
>>   [10] .rela.dyn         RELA             0000000000400500  00000500
>>        0000000000000030  0000000000000018   A       6     0     8
>>   [11] .rela.plt         RELA             0000000000400530  00000530
>>        0000000000000018  0000000000000018  AI       6    24     8
>>   [12] .init             PROGBITS         0000000000401000  00001000
>>        000000000000001b  0000000000000000  AX       0     0     4
>>   [13] .plt              PROGBITS         0000000000401020  00001020
>>        0000000000000020  0000000000000010  AX       0     0     16
>>   [14] .text             PROGBITS         0000000000401040  00001040
>>        000000000000011b  0000000000000000  AX       0     0     16
>>   [15] .fini             PROGBITS         000000000040115c  0000115c
>>        000000000000000d  0000000000000000  AX       0     0     4
>>   [16] .rodata           PROGBITS         0000000000402000  00002000
>>        0000000000000004  0000000000000004  AM       0     0     4
>>   [17] .eh_frame_hdr     PROGBITS         0000000000402004  00002004
>>        000000000000002c  0000000000000000   A       0     0     4
>>   [18] .eh_frame         PROGBITS         0000000000402030  00002030
>>        0000000000000088  0000000000000000   A       0     0     8
>>   [19] .note.ABI-tag     NOTE             00000000004020b8  000020b8
>>        0000000000000020  0000000000000000   A       0     0     4
>>   [20] .init_array       INIT_ARRAY       0000000000403de8  00002de8
>>        0000000000000008  0000000000000008  WA       0     0     8
>>   [21] .fini_array       FINI_ARRAY       0000000000403df0  00002df0
>>        0000000000000008  0000000000000008  WA       0     0     8
>>   [22] .dynamic          DYNAMIC          0000000000403df8  00002df8
>>        00000000000001e0  0000000000000010  WA       7     0     8
>>   [23] .got              PROGBITS         0000000000403fd8  00002fd8
>>        0000000000000010  0000000000000008  WA       0     0     8
>>   [24] .got.plt          PROGBITS         0000000000403fe8  00002fe8
>>        0000000000000020  0000000000000008  WA       0     0     8
>>   [25] .data             PROGBITS         0000000000404008  00003008
>>        0000000000000010  0000000000000000  WA       0     0     8
>>   [26] .bss              NOBITS           0000000000404018  00003018
>>        0000000000000008  0000000000000000  WA       0     0     1
>>   [27] .comment          PROGBITS         0000000000000000  00003018
>>        0000000000000019  0000000000000001  MS       0     0     1
>>   [28] .debug_aranges    PROGBITS         0000000000000000  00003040
>>        0000000000000150  0000000000000000           0     0     16
>>   [29] .debug_info       PROGBITS         0000000000000000  00003190
>>        0000000000000444  0000000000000000           0     0     1
>>   [30] .debug_abbrev     PROGBITS         0000000000000000  000035d4
>>        0000000000000245  0000000000000000           0     0     1
>>   [31] .debug_line       PROGBITS         0000000000000000  00003819
>>        0000000000000274  0000000000000000           0     0     1
>>   [32] .debug_str        PROGBITS         0000000000000000  00003a8d
>>        0000000000000540  0000000000000001  MS       0     0     1
>>   [33] .debug_line_str   PROGBITS         0000000000000000  00003fcd
>>        0000000000000163  0000000000000001  MS       0     0     1
>>   [34] .debug_rnglists   PROGBITS         0000000000000000  00004130
>>        0000000000000042  0000000000000000           0     0     1
>>   [35] .symtab           SYMTAB           0000000000000000  00004178
>>        0000000000000360  0000000000000018          36    20     8
>>   [36] .strtab           STRTAB           0000000000000000  000044d8
>>        00000000000001bc  0000000000000000           0     0     1
>>   [37] .shstrtab         STRTAB           0000000000000000  00004694
>>        0000000000000176  0000000000000000           0     0     1
>> Key to Flags:
>>   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
>>   L (link order), O (extra OS processing required), G (group), T (TLS),
>>   C (compressed), x (unknown), o (OS specific), E (exclude),
>>   D (mbind), l (large), p (processor specific)
>>
>> Notice the section header table, and how it starts after program text and
>> program data, and how all the other ELF gunk (debug info, symtab,
>> strtab(s)) also goes after .data. So (mostly) the real problematic readahead
>> would be on the RW VMA that covers .data.
>>
>> (This also matches my understanding of linkers, where they generally do
>> (to put it simply) ELF headers - program headers - .text - .data - .bss, with
>> stripable gunk after it.)
>>
>> It's also the case that synchronous RA on VM_EXEC is already pretty
>> conservative and limited, see the big if (vm_flags & VM_EXEC) in
>> do_sync_mmap_readahead(). (I think the underlying logic behind also
>> implies that async RA will not be started against these pages, but I
>> am not sure).
>>
>> --
>> Pedro
> 
> Yes, I think readahead of VM_EXEC is already restricted to the VMA.
> Maybe there is an edge case where someone does buffered reads on an
> ELF file, leaving a PG_readahead flag inside the VM_EXEC range, then
> it could trigger async readahead beyond the end, but that sounds
> minor.
> 
> For next steps: Suppose we show the mmap usage in this video encoder
> is significantly inefficient compared to buffered reads or a big mmap
> and that project accepts a contribution to move away from the small
> mmaps. Would we be comfortable attempting this again as is? Probably
> there would be a lag before all users of the encoder update and they
> may see this bad perf.

How could you be sure that there isn't some other proprietary (or even just
another open-source) software out there that relies on similar things, but
doesn't provide easy benchmarks that can easily catch this?

-- 
Cheers,

David


      reply	other threads:[~2026-06-24  9:04 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-19 11:28 [PATCH mm-hotfixses] Revert "mm: limit filemap_fault readahead to VMA boundaries" Lorenzo Stoakes
2026-06-19 11:40 ` David Hildenbrand (Arm)
2026-06-19 11:58 ` Pedro Falcato
2026-06-19 15:47 ` Matthew Wilcox
2026-06-19 16:37 ` Andrew Morton
2026-06-19 16:52   ` Matthew Wilcox
2026-06-19 17:07     ` Lorenzo Stoakes
2026-06-19 17:08       ` Lorenzo Stoakes
2026-06-19 17:18       ` Pedro Falcato
2026-06-19 17:43         ` Suren Baghdasaryan
2026-06-19 17:46           ` Matthew Wilcox
2026-06-19 17:52             ` Suren Baghdasaryan
2026-06-19 19:26               ` Pedro Falcato
2026-06-19 19:33                 ` Suren Baghdasaryan
2026-06-22  9:26                   ` Jan Kara
2026-06-22 13:55                     ` Lorenzo Stoakes
2026-06-22 16:58                       ` Andrew Morton
2026-06-22 17:11                         ` Matthew Wilcox
2026-06-22 17:57                           ` Suren Baghdasaryan
2026-06-22 19:29                             ` Pedro Falcato
2026-06-23 19:28                               ` Frederick Mayle
2026-06-24  9:04                                 ` David Hildenbrand (Arm) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83df186a-c3e1-488d-8981-94192f36a2ea@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=fmayle@google.com \
    --cc=jack@suse.cz \
    --cc=kaleshsingh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=pfalcato@suse.de \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.