public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	 Clemens Ladisch <clemens@ladisch.de>,
	Arnd Bergmann <arnd@arndb.de>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"K . Y . Srinivasan" <kys@microsoft.com>,
	 Haiyang Zhang <haiyangz@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>,  Dexuan Cui <decui@microsoft.com>,
	Long Li <longli@microsoft.com>,
	 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	 Maxime Coquelin <mcoquelin.stm32@gmail.com>,
	Alexandre Torgue <alexandre.torgue@foss.st.com>,
	 Miquel Raynal <miquel.raynal@bootlin.com>,
	Richard Weinberger <richard@nod.at>,
	 Vignesh Raghavendra <vigneshr@ti.com>,
	Bodo Stroesser <bostroesser@gmail.com>,
	 "Martin K . Petersen" <martin.petersen@oracle.com>,
	David Howells <dhowells@redhat.com>,
	 Marc Dionne <marc.dionne@auristor.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	 Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	David Hildenbrand <david@kernel.org>,
	 "Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	 Mike Rapoport <rppt@kernel.org>, Michal Hocko <mhocko@suse.com>,
	Jann Horn <jannh@google.com>,  Pedro Falcato <pfalcato@suse.de>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	 linux-hyperv@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	 linux-arm-kernel@lists.infradead.org,
	linux-mtd@lists.infradead.org,  linux-staging@lists.linux.dev,
	linux-scsi@vger.kernel.org,  target-devel@vger.kernel.org,
	linux-afs@lists.infradead.org,  linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org,  Ryan Roberts <ryan.roberts@arm.com>
Subject: Re: [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback
Date: Mon, 16 Mar 2026 15:59:26 -0700	[thread overview]
Message-ID: <CAJuCfpEjTw1nQik_HWXHg2su2DwzPrn5NPGpeAVPrjJK0tOSkg@mail.gmail.com> (raw)
In-Reply-To: <6a0e73a5-519e-49ca-9f76-2f6cc5a1577c@lucifer.local>

On Mon, Mar 16, 2026 at 12:17 PM Lorenzo Stoakes (Oracle)
<ljs@kernel.org> wrote:
>
> On Sun, Mar 15, 2026 at 04:23:14PM -0700, Suren Baghdasaryan wrote:
> > On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > >
> > > This documentation makes it easier for a driver/file system implementer to
> > > correctly use this callback.
> > >
> > > It covers the fundamentals, whilst intentionally leaving the less lovely
> > > possible actions one might take undocumented (for instance - the
> > > success_hook, error_hook fields in mmap_action).
> > >
> > > The document also covers the new VMA flags implementation which is the only
> > > one which will work correctly with mmap_prepare.
> > >
> > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > ---
> > >  Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
> > >  1 file changed, 131 insertions(+)
> > >  create mode 100644 Documentation/filesystems/mmap_prepare.rst
> > >
> > > diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
> > > new file mode 100644
> > > index 000000000000..76908200f3a1
> > > --- /dev/null
> > > +++ b/Documentation/filesystems/mmap_prepare.rst
> > > @@ -0,0 +1,131 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +
> > > +===========================
> > > +mmap_prepare callback HOWTO
> > > +===========================
> > > +
> > > +Introduction
> > > +############
> > > +
> > > +The `struct file->f_op->mmap()` callback has been deprecated as it is both a
> > > +stability and security risk, and doesn't always permit the merging of adjacent
> > > +mappings resulting in unnecessary memory fragmentation.
> > > +
> > > +It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
> > > +these problems.
> > > +
> > > +## How To Use
> > > +
> > > +In your driver's `struct file_operations` struct, specify an `mmap_prepare`
> > > +callback rather than an `mmap` one, e.g. for ext4:
> > > +
> > > +
> > > +.. code-block:: C
> > > +
> > > +    const struct file_operations ext4_file_operations = {
> > > +        ...
> > > +        .mmap_prepare    = ext4_file_mmap_prepare,
> > > +    };
> > > +
> > > +This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
> > > +
> > > +Examining the `struct vm_area_desc` type:
> > > +
> > > +.. code-block:: C
> > > +
> > > +    struct vm_area_desc {
> > > +        /* Immutable state. */
> > > +        const struct mm_struct *const mm;
> > > +        struct file *const file; /* May vary from vm_file in stacked callers. */
> > > +        unsigned long start;
> > > +        unsigned long end;
> > > +
> > > +        /* Mutable fields. Populated with initial state. */
> > > +        pgoff_t pgoff;
> > > +        struct file *vm_file;
> > > +        vma_flags_t vma_flags;
> > > +        pgprot_t page_prot;
> > > +
> > > +        /* Write-only fields. */
> > > +        const struct vm_operations_struct *vm_ops;
> > > +        void *private_data;
> > > +
> > > +        /* Take further action? */
> > > +        struct mmap_action action;
> >
> > So, action still belongs to /* Write-only fields. */ section? This is
> > nitpicky, but it might be better to have this as:
> >
> >         /* Write-only fields. */
> >         const struct vm_operations_struct *vm_ops;
> >         void *private_data;
> >         struct mmap_action action; /* Take further action? */
>
> Absolutely not. This field is not to be written to by the user.
>
> We sadly have to allow hugetlb to do some hacks, but these are things we don't
> want to point out.

Ack.

>
> Users should use mmap_action_xxx() functions.
>
> >
> > > +    };
> > > +
> > > +This is straightforward - you have all the fields you need to set up the
> > > +mapping, and you can update the mutable and writable fields, for instance:
> > > +
> > > +.. code-block:: Cw
> > > +
> > > +    static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
> > > +    {
> > > +        int ret;
> > > +        struct file *file = desc->file;
> > > +        struct inode *inode = file->f_mapping->host;
> > > +
> > > +        ...
> > > +
> > > +        file_accessed(file);
> > > +        if (IS_DAX(file_inode(file))) {
> > > +            desc->vm_ops = &ext4_dax_vm_ops;
> > > +            vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
> > > +        } else {
> > > +            desc->vm_ops = &ext4_file_vm_ops;
> > > +        }
> > > +        return 0;
> > > +    }
> > > +
> > > +Importantly, you no longer have to dance around with reference counts or locks
> > > +when updating these fields - __you can simply go ahead and change them__.
> > > +
> > > +Everything is taken care of by the mapping code.
> > > +
> > > +VMA Flags
> > > +=========
> > > +
> > > +Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
> > > +you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
> > > +`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
> > > +locking done correctly for you, this is no longer necessary.
> > > +
> > > +Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
> > > +etc. - i.e. using a `VM_xxx` macro has changed too.
> > > +
> > > +When implementing `mmap_prepare()`, reference flags by their bit number, defined
> > > +as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
> > > +of (where `desc` is a pointer to `struct vma_area_desc`):
> > > +
> > > +* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
> > > +  wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
> > > +  VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
> > > +  otherwise `false`.
> > > +* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
> > > +  additional flags specified by a comma-separated list,
> > > +  e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
> > > +* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
> > > +  flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
> > > +  VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
> > > +
> > > +Actions
> > > +=======
> > > +
> > > +You can now very easily have actions be performed upon a mapping once set up by
> > > +utilising simple helper functions invoked upon the `struct vm_area_desc`
> > > +pointer. These are:
> > > +
> > > +* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
> > > +  range starting a virtual address and PFN number of a set size.
> > > +
> > > +* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
> > > +  entire mapping from `start_pfn` onward.
> > > +
> > > +* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
> > > +  remap.
> > > +
> > > +* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
> > > +  the entire mapping from `start_pfn` onward.
> > > +
> > > +**NOTE:** The 'action' field should never normally be manipulated directly,
> > > +rather you ought to use one of these helpers.
> >
> > I'm guessing the start and size parameters passed to
> > mmap_action_remap() and such are restricted by vm_area_desc.start
> > vm_area_desc.end. If so, should we document those restrictions and
> > enforce them in the code?
>
> I mean it's the same restrictions as all of the functions already apply if you
> were to use them with a VMA descriptor.
>
> I think implicitly a remap will fail if you try it out of the VMA range at the
> point of applying the change.
>
> But it might be worth adding range_in_vma_desc() checks at prepare time, will
> see if I can do that for the respin.
>
> I think it's pretty obvious that you shouldn't be trying to remap totally
> unrelated memory, so I'm not sure that's at a level of granularity that's suited
> to this document though.

I just saw you already have WARN_ON_ONCE() inside mmap_action_remap()
to check for these limits, so codewise I think we are already good.

For documentation I'll rely on your judgement whether to mention this or not.

>
> >
> > > +    struct vm_area_desc {
> > > +        /* Immutable state. */
> > > +        const struct mm_struct *const mm;
> > > +        struct file *const file; /* May vary from vm_file in stacked callers. */
> > > +        unsigned long start;
> > > +        unsigned long end;
> >
> >
> > > --
> > > 2.53.0
> > >

  reply	other threads:[~2026-03-16 22:59 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 01/15] mm: various small mmap_prepare cleanups Lorenzo Stoakes (Oracle)
2026-03-12 21:14   ` Andrew Morton
2026-03-13 12:13     ` Lorenzo Stoakes (Oracle)
2026-03-15 22:56   ` Suren Baghdasaryan
2026-03-15 23:06     ` Suren Baghdasaryan
2026-03-16 14:47       ` Lorenzo Stoakes (Oracle)
2026-03-16 14:44     ` Lorenzo Stoakes (Oracle)
2026-03-16 21:27       ` Suren Baghdasaryan
2026-03-12 20:27 ` [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback Lorenzo Stoakes (Oracle)
2026-03-13  0:12   ` Randy Dunlap
2026-03-16 14:51     ` Lorenzo Stoakes (Oracle)
2026-03-15 23:23   ` Suren Baghdasaryan
2026-03-16 19:16     ` Lorenzo Stoakes (Oracle)
2026-03-16 22:59       ` Suren Baghdasaryan [this message]
2026-03-12 20:27 ` [PATCH 03/15] mm: document vm_operations_struct->open the same as close() Lorenzo Stoakes (Oracle)
2026-03-16  0:43   ` Suren Baghdasaryan
2026-03-16 14:31     ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 04/15] mm: add vm_ops->mapped hook Lorenzo Stoakes (Oracle)
2026-03-13 11:02   ` Usama Arif
2026-03-13 11:58     ` Lorenzo Stoakes (Oracle)
2026-03-16  2:18       ` Suren Baghdasaryan
2026-03-16 13:39         ` Lorenzo Stoakes (Oracle)
2026-03-16 23:39           ` Suren Baghdasaryan
2026-03-17  8:42             ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure Lorenzo Stoakes (Oracle)
2026-03-13 11:07   ` Usama Arif
2026-03-13 12:00     ` Lorenzo Stoakes (Oracle)
2026-03-16  2:32       ` Suren Baghdasaryan
2026-03-16 14:29         ` Lorenzo Stoakes (Oracle)
2026-03-17  3:41           ` Suren Baghdasaryan
2026-03-17  8:58             ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 06/15] mm: add mmap_action_simple_ioremap() Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 07/15] misc: open-dice: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 08/15] hpet: " Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 09/15] mtdchar: replace deprecated mmap hook with mmap_prepare, clean up Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 10/15] stm: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 11/15] staging: vme_user: " Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 12/15] mm: allow handling of stacked mmap_prepare hooks in more drivers Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 13/15] drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 14/15] uio: replace deprecated mmap hook with mmap_prepare in uio_info Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]() Lorenzo Stoakes (Oracle)
2026-03-12 23:15   ` Randy Dunlap
2026-03-16 14:54     ` Lorenzo Stoakes (Oracle)
2026-03-12 21:23 ` [PATCH 00/15] mm: expand mmap_prepare functionality and usage Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpEjTw1nQik_HWXHg2su2DwzPrn5NPGpeAVPrjJK0tOSkg@mail.gmail.com \
    --to=surenb@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexandre.torgue@foss.st.com \
    --cc=arnd@arndb.de \
    --cc=bostroesser@gmail.com \
    --cc=brauner@kernel.org \
    --cc=clemens@ladisch.de \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=decui@microsoft.com \
    --cc=dhowells@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=kys@microsoft.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-staging@lists.linux.dev \
    --cc=linux-stm32@st-md-mailman.stormreply.com \
    --cc=ljs@kernel.org \
    --cc=longli@microsoft.com \
    --cc=marc.dionne@auristor.com \
    --cc=martin.petersen@oracle.com \
    --cc=mcoquelin.stm32@gmail.com \
    --cc=mhocko@suse.com \
    --cc=miquel.raynal@bootlin.com \
    --cc=pfalcato@suse.de \
    --cc=richard@nod.at \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=target-devel@vger.kernel.org \
    --cc=vbabka@kernel.org \
    --cc=vigneshr@ti.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox