From: Mike Rapoport <rppt@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
David Hildenbrand <david@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Arnd Bergmann <arnd@arndb.de>,
Christian Brauner <brauner@kernel.org>,
linux-mm@kvack.org, linux-arch@vger.kernel.org,
linux-kernel@vger.kernel.org, SeongJae Park <sj@kernel.org>,
Usama Arif <usamaarif642@gmail.com>,
linux-api@vger.kernel.org
Subject: Re: [RFC PATCH 0/5] add process_madvise() flags to modify behaviour
Date: Thu, 22 May 2025 15:12:05 +0300 [thread overview]
Message-ID: <aC8UlSupN7_YXfma@kernel.org> (raw)
In-Reply-To: <cover.1747686021.git.lorenzo.stoakes@oracle.com>
(cc'ing linux-api)
On Mon, May 19, 2025 at 09:52:37PM +0100, Lorenzo Stoakes wrote:
> REVIEWERS NOTES:
> ================
>
> This is a VERY EARLY version of the idea, it's relatively untested, and I'm
> 'putting it out there' for feedback. Any serious version of this will add a
> bunch of self-tests to assert correct behaviour and I will more carefully
> confirm everything's working.
>
> This is based on discussion arising from Usama's series [0], SJ's input on
> the thread around process_madvise() behaviour [1] (and a subsequent
> response by me [2]) and prior discussion about a new madvise() interface
> [3].
>
> [0]: https://lore.kernel.org/linux-mm/20250515133519.2779639-1-usamaarif642@gmail.com/
> [1]: https://lore.kernel.org/linux-mm/20250517162048.36347-1-sj@kernel.org/
> [2]: https://lore.kernel.org/linux-mm/e3ba284c-3cb1-42c1-a0ba-9c59374d0541@lucifer.local/
> [3]: https://lore.kernel.org/linux-mm/c390dd7e-0770-4d29-bb0e-f410ff6678e3@lucifer.local/
>
> ================
>
> Currently, we are rather restricted in how madvise() operations
> proceed. While effort has been put in to expanding what process_madvise()
> can do (that is - unrestricted application of advice to the local process
> alongside recent improvements on the efficiency of TLB operations over
> these batvches), we are still constrained by existing madvise() limitations
> and default behaviours.
>
> This series makes use of the currently unused flags field in
> process_madvise() to provide more flexiblity.
>
> It introduces four flags:
>
> 1. PMADV_SKIP_ERRORS
>
> Currently, when an error arises applying advice in any individual VMA
> (keeping in mind that a range specified to madvise() or as part of the
> iovec passed to process_madvise()), the operation stops where it is and
> returns an error.
>
> This might not be the desired behaviour of the user, who may wish instead
> for the operation to be 'best effort'. By setting this flag, that behaviour
> is obtained.
>
> Since process_madvise() would trivially, if skipping errors, simply return
> the input vector size, we instead return the number of entries in the
> vector which completed successfully without error.
>
> The PMADV_SKIP_ERRORS flag implicitly implies PMADV_NO_ERROR_ON_UNMAPPED.
>
> 2. PMADV_NO_ERROR_ON_UNMAPPED
>
> Currently madvise() has the peculiar behaviour of, if the range specified
> to it contains unmapped range(s), completing the full operation, but
> ultimately returning -ENOMEM.
>
> In the case of process_madvise(), this is fatal, as the operation will stop
> immediately upon this occurring.
>
> By setting PMADV_NO_ERROR_ON_UNMAPPED, the user can indicate that it wishes
> unmapped areas to simply be entirely ignored.
>
> 3. PMADV_SET_FORK_EXEC_DEFAULT
>
> It may be desirable for a user to specify that all VMAs mapped in a process
> address space default to having an madvise() behaviour established by
> default, in such a fashion as that this persists across fork/exec.
>
> Since this is a very powerful option that would make no sense for many
> advice modes, we explicitly only permit known-safe flags here (currently
> MADV_HUGEPAGE and MADV_NOHUGEPAGE only).
>
> 4. PMADV_ENTIRE_ADDRESS_SPACE
>
> It can be annoying, should a user wish to apply madvise() to all VMAs in an
> address space, to have to add a singular large entry to the input iovec.
>
> So provide sugar to permit this - PMADV_ENTIRE_ADDRESS_SPACE. If specified,
> we expect the user to pass NULL and -1 to the vec and vlen parameters
> respectively so they explicitly acknowledge that these will be ignored,
> e.g.:
>
> process_madvise(PIDFD_SELF, NULL, -1, MADV_HUGEPAGE,
> PMADV_ENTIRE_ADDRESS_SPACE | PMADV_SKIP_ERRORS);
>
> Usually a user ought to prefer setting PMADV_SKIP_ERRORS here as it may
> well be the case that incompatible VMAs will be encountered that ought to
> be skipped.
>
> If this is not set, the PMADV_NO_ERROR_ON_UNMAPPED (which was otherwise
> implicitly implied by PMADV_SKIP_ERRORS) ought to be set as of course, the
> entire address space spans at least some gaps.
>
> Lorenzo Stoakes (5):
> mm: madvise: refactor madvise_populate()
> mm/madvise: add PMADV_SKIP_ERRORS process_madvise() flag
> mm/madvise: add PMADV_NO_ERROR_ON_UNMAPPED process_madvise() flag
> mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT process_madvise() flag
> mm/madvise: add PMADV_ENTIRE_ADDRESS_SPACE process_madvise() flag
>
> include/uapi/asm-generic/mman-common.h | 6 +
> mm/madvise.c | 206 +++++++++++++++++++------
> 2 files changed, 168 insertions(+), 44 deletions(-)
>
> --
> 2.49.0
>
--
Sincerely yours,
Mike.
prev parent reply other threads:[~2025-05-22 12:12 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-19 20:52 [RFC PATCH 0/5] add process_madvise() flags to modify behaviour Lorenzo Stoakes
2025-05-19 20:52 ` [RFC PATCH 1/5] mm: madvise: refactor madvise_populate() Lorenzo Stoakes
2025-05-20 10:30 ` David Hildenbrand
2025-05-20 10:36 ` Lorenzo Stoakes
2025-05-20 10:42 ` David Hildenbrand
2025-05-22 12:32 ` Mike Rapoport
2025-05-19 20:52 ` [RFC PATCH 2/5] mm/madvise: add PMADV_SKIP_ERRORS process_madvise() flag Lorenzo Stoakes
2025-05-20 16:52 ` kernel test robot
2025-05-19 20:52 ` [RFC PATCH 3/5] mm/madvise: add PMADV_NO_ERROR_ON_UNMAPPED " Lorenzo Stoakes
2025-05-20 19:28 ` kernel test robot
2025-05-19 20:52 ` [RFC PATCH 4/5] mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT " Lorenzo Stoakes
2025-05-20 8:38 ` Pedro Falcato
2025-05-20 10:21 ` Lorenzo Stoakes
2025-05-20 11:41 ` Pedro Falcato
2025-05-20 13:39 ` Lorenzo Stoakes
2025-05-20 16:11 ` Jann Horn
2025-05-20 16:19 ` Lorenzo Stoakes
2025-05-20 16:35 ` David Hildenbrand
2025-05-20 22:26 ` Johannes Weiner
2025-05-29 14:46 ` Lorenzo Stoakes
2025-05-20 22:56 ` kernel test robot
2025-05-19 20:52 ` [RFC PATCH 5/5] mm/madvise: add PMADV_ENTIRE_ADDRESS_SPACE " Lorenzo Stoakes
2025-05-19 21:53 ` [RFC PATCH 0/5] add process_madvise() flags to modify behaviour Jann Horn
2025-05-20 5:35 ` Lorenzo Stoakes
2025-05-20 16:04 ` Jann Horn
2025-05-20 16:14 ` Lorenzo Stoakes
2025-05-20 15:28 ` David Hildenbrand
2025-05-20 17:47 ` Lorenzo Stoakes
2025-05-20 18:24 ` Usama Arif
2025-05-20 19:21 ` Lorenzo Stoakes
2025-05-20 19:42 ` Usama Arif
2025-05-20 20:15 ` Lorenzo Stoakes
2025-05-20 18:25 ` Lorenzo Stoakes
2025-05-20 18:39 ` David Hildenbrand
2025-05-20 18:25 ` Shakeel Butt
2025-05-20 18:45 ` Lorenzo Stoakes
2025-05-20 19:49 ` Shakeel Butt
2025-05-20 20:39 ` Lorenzo Stoakes
2025-05-20 22:02 ` Shakeel Butt
2025-05-21 4:21 ` Lorenzo Stoakes
2025-05-21 16:28 ` Shakeel Butt
2025-05-21 16:49 ` Lorenzo Stoakes
2025-05-21 17:39 ` Shakeel Butt
2025-05-22 13:05 ` David Hildenbrand
2025-05-22 13:21 ` Lorenzo Stoakes
2025-05-22 20:53 ` Shakeel Butt
2025-05-26 12:57 ` David Hildenbrand
2025-05-21 16:57 ` Usama Arif
2025-05-21 17:39 ` Lorenzo Stoakes
2025-05-21 18:25 ` Usama Arif
2025-05-21 18:40 ` Lorenzo Stoakes
2025-05-21 18:45 ` Usama Arif
2025-05-21 17:32 ` Johannes Weiner
2025-05-21 18:11 ` Lorenzo Stoakes
2025-05-22 12:45 ` David Hildenbrand
2025-05-22 13:49 ` Lorenzo Stoakes
2025-05-22 15:32 ` Mike Rapoport
2025-05-22 15:47 ` Lorenzo Stoakes
2025-05-21 2:16 ` Liam R. Howlett
2025-05-22 12:12 ` Mike Rapoport [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC8UlSupN7_YXfma@kernel.org \
--to=rppt@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=brauner@kernel.org \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=sj@kernel.org \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.