linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Matthew Wilcox <willy@infradead.org>, Guo Ren <guoren@kernel.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	"David S . Miller" <davem@davemloft.net>,
	Andreas Larsson <andreas@gaisler.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Nicolas Pitre <nico@fluxnic.net>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Konstantin Komarov <almaz.alexandrovich@paragon-software.com>,
	Baoquan He <bhe@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
	Dave Young <dyoung@redhat.com>, Tony Luck <tony.luck@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Dave Martin <Dave.Martin@arm.com>,
	James Morse <james.morse@arm.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	Jann Horn <jannh@google.com>, Pedro Falcato <pfalcato@suse.de>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-csky@vger.kernel.org,
	linux-mips@vger.kernel.org, linux-s390@vger.kernel.org,
	sparclinux@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-mm@kvack.org,
	ntfs3@lists.linux.dev, kexec@lists.infradead.org,
	kasan-dev@googlegroups.com, Jason Gunthorpe <jgg@nvidia.com>
Subject: [PATCH v2 00/16] expand mmap_prepare functionality, port more users
Date: Wed, 10 Sep 2025 21:21:55 +0100	[thread overview]
Message-ID: <cover.1757534913.git.lorenzo.stoakes@oracle.com> (raw)

Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), The f_op->mmap hook has been deprecated in favour of
f_op->mmap_prepare.

This was introduced in order to make it possible for us to eventually
eliminate the f_op->mmap hook which is highly problematic as it allows
drivers and filesystems raw access to a VMA which is not yet correctly
initialised.

This hook also introduced complexity for the memory mapping operation, as
we must correctly unwind what we do should an error arises.

Overall this interface being so open has caused significant problems for
us, including security issues, it is important for us to simply eliminate
this as a source of problems.

Therefore this series continues what was established by extending the
functionality further to permit more drivers and filesystems to use
mmap_prepare.

We start by udpating some existing users who can use the mmap_prepare
functionality as-is.

We then introduce the concept of an mmap 'action', which a user, on
mmap_prepare, can request to be performed upon the VMA:

* Nothing - default, we're done
* Remap PFN - perform PFN remap with specified parameters
* Insert mixed - Insert a linear PFN range as a mixed map
* Insert mixed pages - Insert a set of specific pages as a mixed map
* Custom action - Should rarely be used, for operations that are truly
  custom. A hook is invoked.

By setting the action in mmap_prepare, this alows us to dynamically decide
what to do next, so if a driver/filesystem needs to determine whether to
e.g. remap or use a mixed map, it can do so then change which is done.

This significantly expands the capabilities of the mmap_prepare hook, while
maintaining as much control as possible in the mm logic.

In the custom hook case, which unfortunately we have to provide for the
obstinate drivers which insist on doing 'interesting' things, we make it
possible for them to invoke mmap actions themselves via
mmap_action_prepare() (to be called in mmap_prepare as necessary) and
mmap_action_complete() (to be called in the custom hook).

This way, we keep as much logic in generic code as possible even in the
custom case.

The point at which the VMA is accessible it is safe for it to be
manipulated as it will already be fully established in the maple tree and
error handling can be simplified to unmapping the VMA.

We split remap_pfn_range*() functions which allow for PFN remap (a typical
mapping prepopulation operation) split between a prepare/complete step, as
well as io_mremap_pfn_range_prepare, complete for a similar purpose.

From there we update various mm-adjacent logic to use this functionality as
a first set of changes, as well as resctl and cramfs filesystems to round
off the non-stacked filesystem instances.

We also add success and error hooks for post-action processing for
e.g. output debug log on success and filtering error codes.

v2:
* Propagated tags, thanks everyone! :)
* Refactored resctl patch to avoid assigned-but-not-used variable.
* Updated resctl change to not use .mmap_abort as discussed with Jason.
* Removed .mmap_abort as discussed with Jason.
* Removed references to .mmap_abort from documentation.
* Fixed silly VM_WARN_ON_ONCE() mistake (asserting opposite of what we mean
  to) as per report from Alexander.
* Fixed relay kerneldoc error.
* Renamed __mmap_prelude to __mmap_setup, keep __mmap_complete the same as
  per David.
* Fixed docs typo in mmap_complete description + formatted bold rather than
  capitalised as per Randy.
* Eliminated mmap_complete and rework into actions specified in mmap_prepare
  (via vm_area_desc) which therefore eliminates the driver's ability to do
  anything crazy and allows us to control generic logic.
* Added helper functions for these -  vma_desc_set_remap(),
  vma_desc_set_mixedmap().
* However unfortunately had to add post action hooks to vm_area_desc, as
  already hugetlbfs for instance needs to access the VMA to function
  correctly. It is at least the smallest possible means of doing this.
* Updated VMA test logic, the stacked filesystem compatibility layer and
  documentation to reflect this.
* Updated hugetlbfs implementation to use new approach, and refactored to
  accept desc where at all possible and to do as much as possible in
  .mmap_prepare, and the minimum required in the new post_hook callback.
* Updated /dev/mem and /dev/zero mmap logic to use the new mechanism.
* Updated cramfs, resctl to use the new mechanism.
* Updated proc_mmap hooks to only have proc_mmap_prepare.
* Updated the vmcore implementation to use the new hooks.
* Updated kcov to use the new hooks.
* Added hooks for success/failure for post-action handling.
* Added custom action hook for truly custom cases.
* Abstracted actions to separate type so we can use generic custom actions in
  custom handlers when necessary.
* Added callout re: lock issue raised in
  https://lore.kernel.org/linux-mm/20250801162930.GB184255@nvidia.com/ as per
  discussion with Jason.

v1:
https://lore.kernel.org/all/cover.1757329751.git.lorenzo.stoakes@oracle.com/

Lorenzo Stoakes (16):
  mm/shmem: update shmem to use mmap_prepare
  device/dax: update devdax to use mmap_prepare
  mm: add vma_desc_size(), vma_desc_pages() helpers
  relay: update relay to use mmap_prepare
  mm/vma: rename __mmap_prepare() function to avoid confusion
  mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
  mm: introduce io_remap_pfn_range_[prepare, complete]()
  mm: add ability to take further action in vm_area_desc
  doc: update porting, vfs documentation for mmap_prepare actions
  mm/hugetlbfs: update hugetlbfs to use mmap_prepare
  mm: update mem char driver to use mmap_prepare
  mm: update resctl to use mmap_prepare
  mm: update cramfs to use mmap_prepare
  fs/proc: add the proc_mmap_prepare hook for procfs
  fs/proc: update vmcore to use .proc_mmap_prepare
  kcov: update kcov to use mmap_prepare

 Documentation/filesystems/porting.rst |   5 +
 Documentation/filesystems/vfs.rst     |   4 +
 arch/csky/include/asm/pgtable.h       |   5 +
 arch/mips/alchemy/common/setup.c      |  28 ++++-
 arch/mips/include/asm/pgtable.h       |  10 ++
 arch/s390/kernel/crash_dump.c         |   6 +-
 arch/sparc/include/asm/pgtable_32.h   |  29 ++++-
 arch/sparc/include/asm/pgtable_64.h   |  29 ++++-
 drivers/char/mem.c                    |  75 ++++++------
 drivers/dax/device.c                  |  32 +++--
 fs/cramfs/inode.c                     |  46 ++++----
 fs/hugetlbfs/inode.c                  |  30 +++--
 fs/ntfs3/file.c                       |   2 +-
 fs/proc/inode.c                       |  12 +-
 fs/proc/vmcore.c                      |  54 ++++++---
 fs/resctrl/pseudo_lock.c              |  22 ++--
 include/linux/hugetlb.h               |   9 +-
 include/linux/hugetlb_inline.h        |  15 ++-
 include/linux/mm.h                    |  83 ++++++++++++-
 include/linux/mm_types.h              |  61 ++++++++++
 include/linux/proc_fs.h               |   1 +
 include/linux/shmem_fs.h              |   3 +-
 include/linux/vmalloc.h               |  10 +-
 kernel/kcov.c                         |  42 ++++---
 kernel/relay.c                        |  33 +++---
 mm/hugetlb.c                          |  77 +++++++-----
 mm/memory.c                           | 128 ++++++++++++--------
 mm/secretmem.c                        |   2 +-
 mm/shmem.c                            |  49 ++++++--
 mm/util.c                             | 150 ++++++++++++++++++++++-
 mm/vma.c                              |  74 ++++++++----
 mm/vmalloc.c                          |  16 ++-
 tools/testing/vma/vma_internal.h      | 164 +++++++++++++++++++++++++-
 33 files changed, 1002 insertions(+), 304 deletions(-)

--
2.51.0


             reply	other threads:[~2025-09-10 20:22 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 20:21 Lorenzo Stoakes [this message]
2025-09-10 20:21 ` [PATCH v2 01/16] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
2025-09-11  8:32   ` Jan Kara
2025-09-10 20:21 ` [PATCH v2 02/16] device/dax: update devdax " Lorenzo Stoakes
2025-09-11  8:35   ` Jan Kara
2025-09-10 20:21 ` [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
2025-09-11  8:36   ` Jan Kara
2025-09-12 17:56   ` David Hildenbrand
2025-09-15 10:12     ` Lorenzo Stoakes
2025-09-10 20:21 ` [PATCH v2 04/16] relay: update relay to use mmap_prepare Lorenzo Stoakes
2025-09-11  8:38   ` Jan Kara
2025-09-10 20:22 ` [PATCH v2 05/16] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
2025-09-12 17:57   ` David Hildenbrand
2025-09-15 10:12     ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 06/16] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 07/16] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
2025-09-12 10:23   ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
2025-09-11 22:07   ` Reinette Chatre
2025-09-12 10:18     ` Lorenzo Stoakes
2025-09-12 10:25   ` Lorenzo Stoakes
2025-09-13 22:54   ` Chris Mason
2025-09-15  9:56     ` Lorenzo Stoakes
2025-09-15 10:09   ` Lorenzo Stoakes
2025-09-15 12:11   ` Jason Gunthorpe
2025-09-15 12:23     ` Lorenzo Stoakes
2025-09-15 12:42       ` Jason Gunthorpe
2025-09-15 12:54         ` Lorenzo Stoakes
2025-09-15 13:11           ` Jason Gunthorpe
2025-09-15 13:51             ` Lorenzo Stoakes
2025-09-15 14:34               ` Jason Gunthorpe
2025-09-15 15:04                 ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 09/16] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
2025-09-11  8:55   ` Jan Kara
2025-09-12 10:19     ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 10/16] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 11/16] mm: update mem char driver " Lorenzo Stoakes
2025-09-18 19:11   ` Chris Mason
2025-09-19  5:13     ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 12/16] mm: update resctl " Lorenzo Stoakes
2025-09-11 22:07   ` Reinette Chatre
2025-09-12 10:14     ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 13/16] mm: update cramfs " Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 14/16] fs/proc: add the proc_mmap_prepare hook for procfs Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 15/16] fs/proc: update vmcore to use .proc_mmap_prepare Lorenzo Stoakes
2025-09-12 10:14   ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 16/16] kcov: update kcov to use mmap_prepare Lorenzo Stoakes
2025-09-15 12:16   ` Jason Gunthorpe
2025-09-15 12:43     ` Lorenzo Stoakes
2025-09-15 12:48       ` Jason Gunthorpe
2025-09-15 13:01         ` Lorenzo Stoakes
2025-09-18 19:45   ` Chris Mason
2025-09-19  5:10     ` Lorenzo Stoakes
2025-09-10 21:38 ` [PATCH v2 00/16] expand mmap_prepare functionality, port more users Andrew Morton
2025-09-11  5:19   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1757534913.git.lorenzo.stoakes@oracle.com \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Dave.Martin@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=almaz.alexandrovich@paragon-software.com \
    --cc=andreas@gaisler.com \
    --cc=andreyknvl@gmail.com \
    --cc=arnd@arndb.de \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=dvyukov@google.com \
    --cc=dyoung@redhat.com \
    --cc=gor@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guoren@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=james.morse@arm.com \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-csky@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nico@fluxnic.net \
    --cc=ntfs3@lists.linux.dev \
    --cc=nvdimm@lists.linux.dev \
    --cc=osalvador@suse.de \
    --cc=pfalcato@suse.de \
    --cc=reinette.chatre@intel.com \
    --cc=rppt@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=svens@linux.ibm.com \
    --cc=tony.luck@intel.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vgoyal@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).