* [PATCH 00/15] mm: expand mmap_prepare functionality and usage
@ 2026-03-12 20:27 Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 01/15] mm: various small mmap_prepare cleanups Lorenzo Stoakes (Oracle)
` (15 more replies)
0 siblings, 16 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
This series expands the mmap_prepare functionality, which is intended to
replace the deprecated f_op->mmap hook which has been the source of bugs
and security issues for some time.
This series starts with some cleanup of existing mmap_prepare logic, then
adds documentation for the mmap_prepare call to make it easier for
filesystem and driver writers to understand how it works.
It then importantly adds a vm_ops->mapped hook, a key feature that was
missing from mmap_prepare previously - this is invoked when a driver which
specifies mmap_prepare has successfully been mapped but not merged with
another VMA.
Importantly, mmap_prepare is invoked prior to a merge being attempted, so
you cannot manipulate state such as reference counts as if it were a new
mapping.
The vm_ops->mapped hook allows a driver to perform tasks required at this
stage, and provides symmetry against subsequent vm_ops->open,close calls.
The series uses this to correct the afs implementation which wrongly
manipulated reference count at mmap_prepare time.
It then adds an mmap_prepare equivalent of vm_iomap_memory() -
mmap_action_simple_ioremap(), then uses this to update a number of drivers.
It then splits out the mmap_prepare compatibility layer (which allows for
invocation of mmap_prepare hooks in an mmap() hook) in such a way as to
allow for more incremental implementation of mmap_prepare hooks.
It then uses this to extend mmap_prepare usage in drivers.
Finally it adds an mmap_prepare equivalent of vm_map_pages(), which lays
the foundation for future work which will extend mmap_prepare to DMA
coherent mappings.
Lorenzo Stoakes (Oracle) (15):
mm: various small mmap_prepare cleanups
mm: add documentation for the mmap_prepare file operation callback
mm: document vm_operations_struct->open the same as close()
mm: add vm_ops->mapped hook
fs: afs: correctly drop reference count on mapping failure
mm: add mmap_action_simple_ioremap()
misc: open-dice: replace deprecated mmap hook with mmap_prepare
hpet: replace deprecated mmap hook with mmap_prepare
mtdchar: replace deprecated mmap hook with mmap_prepare, clean up
stm: replace deprecated mmap hook with mmap_prepare
staging: vme_user: replace deprecated mmap hook with mmap_prepare
mm: allow handling of stacked mmap_prepare hooks in more drivers
drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
uio: replace deprecated mmap hook with mmap_prepare in uio_info
mm: add mmap_action_map_kernel_pages[_full]()
Documentation/driver-api/vme.rst | 2 +-
Documentation/filesystems/mmap_prepare.rst | 141 ++++++++++++++
drivers/char/hpet.c | 12 +-
drivers/hv/hyperv_vmbus.h | 4 +-
drivers/hv/vmbus_drv.c | 27 ++-
drivers/hwtracing/stm/core.c | 31 ++-
drivers/misc/open-dice.c | 19 +-
drivers/mtd/mtdchar.c | 21 +-
drivers/staging/vme_user/vme.c | 20 +-
drivers/staging/vme_user/vme.h | 2 +-
drivers/staging/vme_user/vme_user.c | 51 +++--
drivers/target/target_core_user.c | 26 ++-
drivers/uio/uio.c | 10 +-
drivers/uio/uio_hv_generic.c | 11 +-
fs/afs/file.c | 20 +-
include/linux/fs.h | 14 +-
include/linux/hyperv.h | 4 +-
include/linux/mm.h | 158 +++++++++++++--
include/linux/mm_types.h | 17 +-
include/linux/uio_driver.h | 4 +-
mm/internal.h | 41 ++--
mm/memory.c | 174 ++++++++++++-----
mm/util.c | 213 +++++++++++++++------
mm/vma.c | 56 ++++--
mm/vma.h | 2 +-
tools/testing/vma/include/dup.h | 143 ++++++++++----
tools/testing/vma/include/stubs.h | 9 +-
27 files changed, 933 insertions(+), 299 deletions(-)
create mode 100644 Documentation/filesystems/mmap_prepare.rst
--
2.53.0
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 01/15] mm: various small mmap_prepare cleanups
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 21:14 ` Andrew Morton
2026-03-15 22:56 ` Suren Baghdasaryan
2026-03-12 20:27 ` [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback Lorenzo Stoakes (Oracle)
` (14 subsequent siblings)
15 siblings, 2 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
Rather than passing arbitrary fields, pass an mmap_action field directly to
mmap prepare and complete helpers to put all the action-specific logic in
the function actually doing the work.
Additionally, allow mmap prepare functions to return an error so we can
error out as soon as possible if there is something logically incorrect in
the input.
Update remap_pfn_range_prepare() to properly check the input range for the
CoW case.
While we're here, make remap_pfn_range_prepare_vma() a little neater, and
pass mmap_action directly to call_action_complete().
Then, update compat_vma_mmap() to perform its logic directly, as
__compat_vma_map() is not used by anything so we don't need to export it.
Also update compat_vma_mmap() to use vfs_mmap_prepare() rather than calling
the mmap_prepare op directly.
Finally, update the VMA userland tests to reflect the changes.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
include/linux/fs.h | 2 -
include/linux/mm.h | 8 +--
mm/internal.h | 28 +++++---
mm/memory.c | 45 +++++++-----
mm/util.c | 112 +++++++++++++-----------------
mm/vma.c | 21 +++---
tools/testing/vma/include/dup.h | 9 ++-
tools/testing/vma/include/stubs.h | 9 +--
8 files changed, 123 insertions(+), 111 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8b3dd145b25e..a2628a12bd2b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2058,8 +2058,6 @@ static inline bool can_mmap_file(struct file *file)
return true;
}
-int __compat_vma_mmap(const struct file_operations *f_op,
- struct file *file, struct vm_area_struct *vma);
int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4c4fd55fc823..cc5960a84382 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4116,10 +4116,10 @@ static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
}
-void mmap_action_prepare(struct mmap_action *action,
- struct vm_area_desc *desc);
-int mmap_action_complete(struct mmap_action *action,
- struct vm_area_struct *vma);
+int mmap_action_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action);
+int mmap_action_complete(struct vm_area_struct *vma,
+ struct mmap_action *action);
/* Look up the first VMA which exactly match the interval vm_start ... vm_end */
static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
diff --git a/mm/internal.h b/mm/internal.h
index 95b583e7e4f7..7bfa85b5e78b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1775,26 +1775,32 @@ int walk_page_range_debug(struct mm_struct *mm, unsigned long start,
void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm);
int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm);
-void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
-int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t pgprot);
+int remap_pfn_range_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action);
+int remap_pfn_range_complete(struct vm_area_struct *vma,
+ struct mmap_action *action);
-static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc,
- unsigned long orig_pfn, unsigned long size)
+static inline int io_remap_pfn_range_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action)
{
+ const unsigned long orig_pfn = action->remap.start_pfn;
+ const unsigned long size = action->remap.size;
const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
- return remap_pfn_range_prepare(desc, pfn);
+ action->remap.start_pfn = pfn;
+ return remap_pfn_range_prepare(desc, action);
}
static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
- unsigned long addr, unsigned long orig_pfn, unsigned long size,
- pgprot_t orig_prot)
+ struct mmap_action *action)
{
- const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
- const pgprot_t prot = pgprot_decrypted(orig_prot);
+ const unsigned long size = action->remap.size;
+ const unsigned long orig_pfn = action->remap.start_pfn;
+ const pgprot_t orig_prot = vma->vm_page_prot;
- return remap_pfn_range_complete(vma, addr, pfn, size, prot);
+ action->remap.pgprot = pgprot_decrypted(orig_prot);
+ action->remap.start_pfn = io_remap_pfn_range_pfn(orig_pfn, size);
+ return remap_pfn_range_complete(vma, action);
}
#ifdef CONFIG_MMU_NOTIFIER
diff --git a/mm/memory.c b/mm/memory.c
index 6aa0ea4af1fc..364fa8a45360 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3099,26 +3099,34 @@ static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
}
#endif
-void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
+int remap_pfn_range_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action)
{
- /*
- * We set addr=VMA start, end=VMA end here, so this won't fail, but we
- * check it again on complete and will fail there if specified addr is
- * invalid.
- */
- get_remap_pgoff(vma_desc_is_cow_mapping(desc), desc->start, desc->end,
- desc->start, desc->end, pfn, &desc->pgoff);
+ const unsigned long start = action->remap.start;
+ const unsigned long end = start + action->remap.size;
+ const unsigned long pfn = action->remap.start_pfn;
+ const bool is_cow = vma_desc_is_cow_mapping(desc);
+ int err;
+
+ err = get_remap_pgoff(is_cow, start, end, desc->start, desc->end, pfn,
+ &desc->pgoff);
+ if (err)
+ return err;
+
vma_desc_set_flags_mask(desc, VMA_REMAP_FLAGS);
+ return 0;
}
-static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size)
+static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long pfn,
+ unsigned long size)
{
- unsigned long end = addr + PAGE_ALIGN(size);
+ const unsigned long end = addr + PAGE_ALIGN(size);
+ const bool is_cow = is_cow_mapping(vma->vm_flags);
int err;
- err = get_remap_pgoff(is_cow_mapping(vma->vm_flags), addr, end,
- vma->vm_start, vma->vm_end, pfn, &vma->vm_pgoff);
+ err = get_remap_pgoff(is_cow, addr, end, vma->vm_start, vma->vm_end,
+ pfn, &vma->vm_pgoff);
if (err)
return err;
@@ -3151,10 +3159,15 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
}
EXPORT_SYMBOL(remap_pfn_range);
-int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+int remap_pfn_range_complete(struct vm_area_struct *vma,
+ struct mmap_action *action)
{
- return do_remap_pfn_range(vma, addr, pfn, size, prot);
+ const unsigned long start = action->remap.start;
+ const unsigned long pfn = action->remap.start_pfn;
+ const unsigned long size = action->remap.size;
+ const pgprot_t prot = action->remap.pgprot;
+
+ return do_remap_pfn_range(vma, start, pfn, size, prot);
}
/**
diff --git a/mm/util.c b/mm/util.c
index ce7ae80047cf..dba1191725b6 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1163,43 +1163,6 @@ void flush_dcache_folio(struct folio *folio)
EXPORT_SYMBOL(flush_dcache_folio);
#endif
-/**
- * __compat_vma_mmap() - See description for compat_vma_mmap()
- * for details. This is the same operation, only with a specific file operations
- * struct which may or may not be the same as vma->vm_file->f_op.
- * @f_op: The file operations whose .mmap_prepare() hook is specified.
- * @file: The file which backs or will back the mapping.
- * @vma: The VMA to apply the .mmap_prepare() hook to.
- * Returns: 0 on success or error.
- */
-int __compat_vma_mmap(const struct file_operations *f_op,
- struct file *file, struct vm_area_struct *vma)
-{
- struct vm_area_desc desc = {
- .mm = vma->vm_mm,
- .file = file,
- .start = vma->vm_start,
- .end = vma->vm_end,
-
- .pgoff = vma->vm_pgoff,
- .vm_file = vma->vm_file,
- .vma_flags = vma->flags,
- .page_prot = vma->vm_page_prot,
-
- .action.type = MMAP_NOTHING, /* Default */
- };
- int err;
-
- err = f_op->mmap_prepare(&desc);
- if (err)
- return err;
-
- mmap_action_prepare(&desc.action, &desc);
- set_vma_from_desc(vma, &desc);
- return mmap_action_complete(&desc.action, vma);
-}
-EXPORT_SYMBOL(__compat_vma_mmap);
-
/**
* compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
* existing VMA and execute any requested actions.
@@ -1228,7 +1191,31 @@ EXPORT_SYMBOL(__compat_vma_mmap);
*/
int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
{
- return __compat_vma_mmap(file->f_op, file, vma);
+ struct vm_area_desc desc = {
+ .mm = vma->vm_mm,
+ .file = file,
+ .start = vma->vm_start,
+ .end = vma->vm_end,
+
+ .pgoff = vma->vm_pgoff,
+ .vm_file = vma->vm_file,
+ .vma_flags = vma->flags,
+ .page_prot = vma->vm_page_prot,
+
+ .action.type = MMAP_NOTHING, /* Default */
+ };
+ int err;
+
+ err = vfs_mmap_prepare(file, &desc);
+ if (err)
+ return err;
+
+ err = mmap_action_prepare(&desc, &desc.action);
+ if (err)
+ return err;
+
+ set_vma_from_desc(vma, &desc);
+ return mmap_action_complete(vma, &desc.action);
}
EXPORT_SYMBOL(compat_vma_mmap);
@@ -1320,8 +1307,8 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
}
}
-static int mmap_action_finish(struct mmap_action *action,
- const struct vm_area_struct *vma, int err)
+static int mmap_action_finish(struct vm_area_struct *vma,
+ struct mmap_action *action, int err)
{
/*
* If an error occurs, unmap the VMA altogether and return an error. We
@@ -1355,35 +1342,36 @@ static int mmap_action_finish(struct mmap_action *action,
* action which need to be performed.
* @desc: The VMA descriptor to prepare for @action.
* @action: The action to perform.
+ *
+ * Returns: 0 on success, otherwise error.
*/
-void mmap_action_prepare(struct mmap_action *action,
- struct vm_area_desc *desc)
+int mmap_action_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action)
+
{
switch (action->type) {
case MMAP_NOTHING:
- break;
+ return 0;
case MMAP_REMAP_PFN:
- remap_pfn_range_prepare(desc, action->remap.start_pfn);
- break;
+ return remap_pfn_range_prepare(desc, action);
case MMAP_IO_REMAP_PFN:
- io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
- action->remap.size);
- break;
+ return io_remap_pfn_range_prepare(desc, action);
}
}
EXPORT_SYMBOL(mmap_action_prepare);
/**
* mmap_action_complete - Execute VMA descriptor action.
- * @action: The action to perform.
* @vma: The VMA to perform the action upon.
+ * @action: The action to perform.
*
* Similar to mmap_action_prepare().
*
* Return: 0 on success, or error, at which point the VMA will be unmapped.
*/
-int mmap_action_complete(struct mmap_action *action,
- struct vm_area_struct *vma)
+int mmap_action_complete(struct vm_area_struct *vma,
+ struct mmap_action *action)
+
{
int err = 0;
@@ -1391,23 +1379,19 @@ int mmap_action_complete(struct mmap_action *action,
case MMAP_NOTHING:
break;
case MMAP_REMAP_PFN:
- err = remap_pfn_range_complete(vma, action->remap.start,
- action->remap.start_pfn, action->remap.size,
- action->remap.pgprot);
+ err = remap_pfn_range_complete(vma, action);
break;
case MMAP_IO_REMAP_PFN:
- err = io_remap_pfn_range_complete(vma, action->remap.start,
- action->remap.start_pfn, action->remap.size,
- action->remap.pgprot);
+ err = io_remap_pfn_range_complete(vma, action);
break;
}
- return mmap_action_finish(action, vma, err);
+ return mmap_action_finish(vma, action, err);
}
EXPORT_SYMBOL(mmap_action_complete);
#else
-void mmap_action_prepare(struct mmap_action *action,
- struct vm_area_desc *desc)
+int mmap_action_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action)
{
switch (action->type) {
case MMAP_NOTHING:
@@ -1417,11 +1401,13 @@ void mmap_action_prepare(struct mmap_action *action,
WARN_ON_ONCE(1); /* nommu cannot handle these. */
break;
}
+
+ return 0;
}
EXPORT_SYMBOL(mmap_action_prepare);
-int mmap_action_complete(struct mmap_action *action,
- struct vm_area_struct *vma)
+int mmap_action_complete(struct vm_area_struct *vma,
+ struct mmap_action *action)
{
int err = 0;
@@ -1436,7 +1422,7 @@ int mmap_action_complete(struct mmap_action *action,
break;
}
- return mmap_action_finish(action, vma, err);
+ return mmap_action_finish(vma, action, err);
}
EXPORT_SYMBOL(mmap_action_complete);
#endif
diff --git a/mm/vma.c b/mm/vma.c
index be64f781a3aa..054cf1d262fb 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2613,15 +2613,19 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
vma_set_page_prot(vma);
}
-static void call_action_prepare(struct mmap_state *map,
- struct vm_area_desc *desc)
+static int call_action_prepare(struct mmap_state *map,
+ struct vm_area_desc *desc)
{
struct mmap_action *action = &desc->action;
+ int err;
- mmap_action_prepare(action, desc);
+ err = mmap_action_prepare(desc, action);
+ if (err)
+ return err;
if (action->hide_from_rmap_until_complete)
map->hold_file_rmap_lock = true;
+ return 0;
}
/*
@@ -2645,7 +2649,9 @@ static int call_mmap_prepare(struct mmap_state *map,
if (err)
return err;
- call_action_prepare(map, desc);
+ err = call_action_prepare(map, desc);
+ if (err)
+ return err;
/* Update fields permitted to be changed. */
map->pgoff = desc->pgoff;
@@ -2700,13 +2706,12 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
}
static int call_action_complete(struct mmap_state *map,
- struct vm_area_desc *desc,
+ struct mmap_action *action,
struct vm_area_struct *vma)
{
- struct mmap_action *action = &desc->action;
int ret;
- ret = mmap_action_complete(action, vma);
+ ret = mmap_action_complete(vma, action);
/* If we held the file rmap we need to release it. */
if (map->hold_file_rmap_lock) {
@@ -2768,7 +2773,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
__mmap_complete(&map, vma);
if (have_mmap_prepare && allocated_new) {
- error = call_action_complete(&map, &desc, vma);
+ error = call_action_complete(&map, &desc.action, vma);
if (error)
return error;
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index 5eb313beb43d..908beb263307 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -1106,7 +1106,7 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
.pgoff = vma->vm_pgoff,
.vm_file = vma->vm_file,
- .vm_flags = vma->vm_flags,
+ .vma_flags = vma->flags,
.page_prot = vma->vm_page_prot,
.action.type = MMAP_NOTHING, /* Default */
@@ -1117,9 +1117,12 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
if (err)
return err;
- mmap_action_prepare(&desc.action, &desc);
+ err = mmap_action_prepare(&desc, &desc.action);
+ if (err)
+ return err;
+
set_vma_from_desc(vma, &desc);
- return mmap_action_complete(&desc.action, vma);
+ return mmap_action_complete(vma, &desc.action);
}
static inline int compat_vma_mmap(struct file *file,
diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/stubs.h
index 947a3a0c2566..76c4b668bc62 100644
--- a/tools/testing/vma/include/stubs.h
+++ b/tools/testing/vma/include/stubs.h
@@ -81,13 +81,14 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
{
}
-static inline void mmap_action_prepare(struct mmap_action *action,
- struct vm_area_desc *desc)
+static inline int mmap_action_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action)
{
+ return 0;
}
-static inline int mmap_action_complete(struct mmap_action *action,
- struct vm_area_struct *vma)
+static inline int mmap_action_complete(struct vm_area_struct *vma,
+ struct mmap_action *action)
{
return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 01/15] mm: various small mmap_prepare cleanups Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-13 0:12 ` Randy Dunlap
2026-03-15 23:23 ` Suren Baghdasaryan
2026-03-12 20:27 ` [PATCH 03/15] mm: document vm_operations_struct->open the same as close() Lorenzo Stoakes (Oracle)
` (13 subsequent siblings)
15 siblings, 2 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
This documentation makes it easier for a driver/file system implementer to
correctly use this callback.
It covers the fundamentals, whilst intentionally leaving the less lovely
possible actions one might take undocumented (for instance - the
success_hook, error_hook fields in mmap_action).
The document also covers the new VMA flags implementation which is the only
one which will work correctly with mmap_prepare.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
1 file changed, 131 insertions(+)
create mode 100644 Documentation/filesystems/mmap_prepare.rst
diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
new file mode 100644
index 000000000000..76908200f3a1
--- /dev/null
+++ b/Documentation/filesystems/mmap_prepare.rst
@@ -0,0 +1,131 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+mmap_prepare callback HOWTO
+===========================
+
+Introduction
+############
+
+The `struct file->f_op->mmap()` callback has been deprecated as it is both a
+stability and security risk, and doesn't always permit the merging of adjacent
+mappings resulting in unnecessary memory fragmentation.
+
+It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
+these problems.
+
+## How To Use
+
+In your driver's `struct file_operations` struct, specify an `mmap_prepare`
+callback rather than an `mmap` one, e.g. for ext4:
+
+
+.. code-block:: C
+
+ const struct file_operations ext4_file_operations = {
+ ...
+ .mmap_prepare = ext4_file_mmap_prepare,
+ };
+
+This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
+
+Examining the `struct vm_area_desc` type:
+
+.. code-block:: C
+
+ struct vm_area_desc {
+ /* Immutable state. */
+ const struct mm_struct *const mm;
+ struct file *const file; /* May vary from vm_file in stacked callers. */
+ unsigned long start;
+ unsigned long end;
+
+ /* Mutable fields. Populated with initial state. */
+ pgoff_t pgoff;
+ struct file *vm_file;
+ vma_flags_t vma_flags;
+ pgprot_t page_prot;
+
+ /* Write-only fields. */
+ const struct vm_operations_struct *vm_ops;
+ void *private_data;
+
+ /* Take further action? */
+ struct mmap_action action;
+ };
+
+This is straightforward - you have all the fields you need to set up the
+mapping, and you can update the mutable and writable fields, for instance:
+
+.. code-block:: Cw
+
+ static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
+ {
+ int ret;
+ struct file *file = desc->file;
+ struct inode *inode = file->f_mapping->host;
+
+ ...
+
+ file_accessed(file);
+ if (IS_DAX(file_inode(file))) {
+ desc->vm_ops = &ext4_dax_vm_ops;
+ vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
+ } else {
+ desc->vm_ops = &ext4_file_vm_ops;
+ }
+ return 0;
+ }
+
+Importantly, you no longer have to dance around with reference counts or locks
+when updating these fields - __you can simply go ahead and change them__.
+
+Everything is taken care of by the mapping code.
+
+VMA Flags
+=========
+
+Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
+you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
+`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
+locking done correctly for you, this is no longer necessary.
+
+Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
+etc. - i.e. using a `VM_xxx` macro has changed too.
+
+When implementing `mmap_prepare()`, reference flags by their bit number, defined
+as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
+of (where `desc` is a pointer to `struct vma_area_desc`):
+
+* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
+ wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
+ VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
+ otherwise `false`.
+* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
+ additional flags specified by a comma-separated list,
+ e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
+* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
+ flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
+ VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
+
+Actions
+=======
+
+You can now very easily have actions be performed upon a mapping once set up by
+utilising simple helper functions invoked upon the `struct vm_area_desc`
+pointer. These are:
+
+* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
+ range starting a virtual address and PFN number of a set size.
+
+* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
+ entire mapping from `start_pfn` onward.
+
+* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
+ remap.
+
+* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
+ the entire mapping from `start_pfn` onward.
+
+**NOTE:** The 'action' field should never normally be manipulated directly,
+rather you ought to use one of these helpers.
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 03/15] mm: document vm_operations_struct->open the same as close()
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 01/15] mm: various small mmap_prepare cleanups Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-16 0:43 ` Suren Baghdasaryan
2026-03-12 20:27 ` [PATCH 04/15] mm: add vm_ops->mapped hook Lorenzo Stoakes (Oracle)
` (12 subsequent siblings)
15 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
Describe when the operation is invoked and the context in which it is
invoked, matching the description already added for vm_op->close().
While we're here, update all outdated references to an 'area' field for
VMAs to the more consistent 'vma'.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
include/linux/mm.h | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index cc5960a84382..12a0b4c63736 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -748,15 +748,20 @@ struct vm_uffd_ops;
* to the functions called when a no-page or a wp-page exception occurs.
*/
struct vm_operations_struct {
- void (*open)(struct vm_area_struct * area);
+ /**
+ * @open: Called when a VMA is remapped or split. Not called upon first
+ * mapping a VMA.
+ * Context: User context. May sleep. Caller holds mmap_lock.
+ */
+ void (*open)(struct vm_area_struct *vma);
/**
* @close: Called when the VMA is being removed from the MM.
* Context: User context. May sleep. Caller holds mmap_lock.
*/
- void (*close)(struct vm_area_struct * area);
+ void (*close)(struct vm_area_struct *vma);
/* Called any time before splitting to check if it's allowed */
- int (*may_split)(struct vm_area_struct *area, unsigned long addr);
- int (*mremap)(struct vm_area_struct *area);
+ int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
+ int (*mremap)(struct vm_area_struct *vma);
/*
* Called by mprotect() to make driver-specific permission
* checks before mprotect() is finalised. The VMA must not
@@ -768,7 +773,7 @@ struct vm_operations_struct {
vm_fault_t (*huge_fault)(struct vm_fault *vmf, unsigned int order);
vm_fault_t (*map_pages)(struct vm_fault *vmf,
pgoff_t start_pgoff, pgoff_t end_pgoff);
- unsigned long (*pagesize)(struct vm_area_struct * area);
+ unsigned long (*pagesize)(struct vm_area_struct *vma);
/* notification that a previously read-only page is about to become
* writable, if an error is returned it will cause a SIGBUS */
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 04/15] mm: add vm_ops->mapped hook
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (2 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 03/15] mm: document vm_operations_struct->open the same as close() Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-13 11:02 ` Usama Arif
2026-03-12 20:27 ` [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure Lorenzo Stoakes (Oracle)
` (11 subsequent siblings)
15 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
Previously, when a driver needed to do something like establish a reference
count, it could do so in the mmap hook in the knowledge that the mapping
would succeed.
With the introduction of f_op->mmap_prepare this is no longer the case, as
it is invoked prior to actually establishing the mapping.
To take this into account, introduce a new vm_ops->mapped callback which is
invoked when the VMA is first mapped (though notably - not when it is
merged - which is correct and mirrors existing mmap/open/close behaviour).
We do better that vm_ops->open() here, as this callback can return an
error, at which point the VMA will be unmapped.
Note that vm_ops->mapped() is invoked after any mmap action is
complete (such as I/O remapping).
We intentionally do not expose the VMA at this point, exposing only the
fields that could be used, and an output parameter in case the operation
needs to update the vma->vm_private_data field.
In order to deal with stacked filesystems which invoke inner filesystem's
mmap() invocations, add __compat_vma_mapped() and invoke it on
vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
callback.
We can now also remove call_action_complete() and invoke
mmap_action_complete() directly, as we separate out the rmap lock logic to
be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
We also abstract unmapping of a VMA on mmap action completion into its own
helper function, unmap_vma_locked().
Additionally, update VMA userland test headers to reflect the change.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
include/linux/fs.h | 9 +++-
include/linux/mm.h | 17 +++++++
mm/internal.h | 10 ++++
mm/util.c | 86 ++++++++++++++++++++++++---------
mm/vma.c | 41 +++++++++++-----
tools/testing/vma/include/dup.h | 34 ++++++++++++-
6 files changed, 158 insertions(+), 39 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a2628a12bd2b..c390f5c667e3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
}
int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
+int __vma_check_mmap_hook(struct vm_area_struct *vma);
static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
{
+ int err;
+
if (file->f_op->mmap_prepare)
return compat_vma_mmap(file, vma);
- return file->f_op->mmap(file, vma);
+ err = file->f_op->mmap(file, vma);
+ if (err)
+ return err;
+
+ return __vma_check_mmap_hook(vma);
}
static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 12a0b4c63736..7333d5db1221 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -759,6 +759,23 @@ struct vm_operations_struct {
* Context: User context. May sleep. Caller holds mmap_lock.
*/
void (*close)(struct vm_area_struct *vma);
+ /**
+ * @mapped: Called when the VMA is first mapped in the MM. Not called if
+ * the new VMA is merged with an adjacent VMA.
+ *
+ * The @vm_private_data field is an output field allowing the user to
+ * modify vma->vm_private_data as necessary.
+ *
+ * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
+ * set from f_op->mmap.
+ *
+ * Returns %0 on success, or an error otherwise. On error, the VMA will
+ * be unmapped.
+ *
+ * Context: User context. May sleep. Caller holds mmap_lock.
+ */
+ int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
+ const struct file *file, void **vm_private_data);
/* Called any time before splitting to check if it's allowed */
int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
int (*mremap)(struct vm_area_struct *vma);
diff --git a/mm/internal.h b/mm/internal.h
index 7bfa85b5e78b..f0f2cf1caa36 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
* mmap hook and safely handle error conditions. On error, VMA hooks will be
* mutated.
*
+ * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
+ *
* @file: File which backs the mapping.
* @vma: VMA which we are mapping.
*
@@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
/* unmap_vmas is in mm/memory.c */
void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
+static inline void unmap_vma_locked(struct vm_area_struct *vma)
+{
+ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+ mmap_assert_locked(vma->vm_mm);
+ do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
+}
+
#ifdef CONFIG_MMU
static inline void get_anon_vma(struct anon_vma *anon_vma)
diff --git a/mm/util.c b/mm/util.c
index dba1191725b6..2b0ed54008d6 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
EXPORT_SYMBOL(flush_dcache_folio);
#endif
+static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct vm_area_desc desc = {
+ .mm = vma->vm_mm,
+ .file = file,
+ .start = vma->vm_start,
+ .end = vma->vm_end,
+
+ .pgoff = vma->vm_pgoff,
+ .vm_file = vma->vm_file,
+ .vma_flags = vma->flags,
+ .page_prot = vma->vm_page_prot,
+
+ .action.type = MMAP_NOTHING, /* Default */
+ };
+ int err;
+
+ err = vfs_mmap_prepare(file, &desc);
+ if (err)
+ return err;
+
+ err = mmap_action_prepare(&desc, &desc.action);
+ if (err)
+ return err;
+
+ set_vma_from_desc(vma, &desc);
+ return mmap_action_complete(vma, &desc.action);
+}
+
+static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
+{
+ const struct vm_operations_struct *vm_ops = vma->vm_ops;
+ void *vm_private_data = vma->vm_private_data;
+ int err;
+
+ if (!vm_ops->mapped)
+ return 0;
+
+ err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
+ &vm_private_data);
+ if (err)
+ unmap_vma_locked(vma);
+ /* Update private data if changed. */
+ if (vm_private_data != vma->vm_private_data)
+ vma->vm_private_data = vm_private_data;
+
+ return err;
+}
+
/**
* compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
* existing VMA and execute any requested actions.
@@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
*/
int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
{
- struct vm_area_desc desc = {
- .mm = vma->vm_mm,
- .file = file,
- .start = vma->vm_start,
- .end = vma->vm_end,
-
- .pgoff = vma->vm_pgoff,
- .vm_file = vma->vm_file,
- .vma_flags = vma->flags,
- .page_prot = vma->vm_page_prot,
-
- .action.type = MMAP_NOTHING, /* Default */
- };
int err;
- err = vfs_mmap_prepare(file, &desc);
- if (err)
- return err;
-
- err = mmap_action_prepare(&desc, &desc.action);
+ err = __compat_vma_mmap(file, vma);
if (err)
return err;
- set_vma_from_desc(vma, &desc);
- return mmap_action_complete(vma, &desc.action);
+ return __compat_vma_mapped(file, vma);
}
EXPORT_SYMBOL(compat_vma_mmap);
+int __vma_check_mmap_hook(struct vm_area_struct *vma)
+{
+ /* vm_ops->mapped is not valid if mmap() is specified. */
+ if (WARN_ON_ONCE(vma->vm_ops->mapped))
+ return -EINVAL;
+
+ return 0;
+}
+EXPORT_SYMBOL(__vma_check_mmap_hook);
+
static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
const struct page *page)
{
@@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
* invoked if we do NOT merge, so we only clean up the VMA we created.
*/
if (err) {
- const size_t len = vma_pages(vma) << PAGE_SHIFT;
-
- do_munmap(current->mm, vma->vm_start, len, NULL);
-
+ unmap_vma_locked(vma);
if (action->error_hook) {
/* We may want to filter the error. */
err = action->error_hook(err);
diff --git a/mm/vma.c b/mm/vma.c
index 054cf1d262fb..ef9f5a5365d1 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
return false;
}
-static int call_action_complete(struct mmap_state *map,
- struct mmap_action *action,
- struct vm_area_struct *vma)
+static int call_mapped_hook(struct vm_area_struct *vma)
{
- int ret;
+ const struct vm_operations_struct *vm_ops = vma->vm_ops;
+ void *vm_private_data = vma->vm_private_data;
+ int err;
- ret = mmap_action_complete(vma, action);
+ if (!vm_ops || !vm_ops->mapped)
+ return 0;
+ err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
+ vma->vm_file, &vm_private_data);
+ if (err) {
+ unmap_vma_locked(vma);
+ return err;
+ }
+ /* Update private data if changed. */
+ if (vm_private_data != vma->vm_private_data)
+ vma->vm_private_data = vm_private_data;
+ return 0;
+}
- /* If we held the file rmap we need to release it. */
- if (map->hold_file_rmap_lock) {
- struct file *file = vma->vm_file;
+static void maybe_drop_file_rmap_lock(struct mmap_state *map,
+ struct vm_area_struct *vma)
+{
+ struct file *file;
- i_mmap_unlock_write(file->f_mapping);
- }
- return ret;
+ if (!map->hold_file_rmap_lock)
+ return;
+ file = vma->vm_file;
+ i_mmap_unlock_write(file->f_mapping);
}
static unsigned long __mmap_region(struct file *file, unsigned long addr,
@@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
__mmap_complete(&map, vma);
if (have_mmap_prepare && allocated_new) {
- error = call_action_complete(&map, &desc.action, vma);
+ error = mmap_action_complete(vma, &desc.action);
+ if (!error)
+ error = call_mapped_hook(vma);
+ maybe_drop_file_rmap_lock(&map, vma);
if (error)
return error;
}
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index 908beb263307..47d8db809f31 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -606,12 +606,34 @@ struct vm_area_struct {
} __randomize_layout;
struct vm_operations_struct {
- void (*open)(struct vm_area_struct * area);
+ /**
+ * @open: Called when a VMA is remapped or split. Not called upon first
+ * mapping a VMA.
+ * Context: User context. May sleep. Caller holds mmap_lock.
+ */
+ void (*open)(struct vm_area_struct *vma);
/**
* @close: Called when the VMA is being removed from the MM.
* Context: User context. May sleep. Caller holds mmap_lock.
*/
- void (*close)(struct vm_area_struct * area);
+ void (*close)(struct vm_area_struct *vma);
+ /**
+ * @mapped: Called when the VMA is first mapped in the MM. Not called if
+ * the new VMA is merged with an adjacent VMA.
+ *
+ * The @vm_private_data field is an output field allowing the user to
+ * modify vma->vm_private_data as necessary.
+ *
+ * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
+ * set from f_op->mmap.
+ *
+ * Returns %0 on success, or an error otherwise. On error, the VMA will
+ * be unmapped.
+ *
+ * Context: User context. May sleep. Caller holds mmap_lock.
+ */
+ int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
+ const struct file *file, void **vm_private_data);
/* Called any time before splitting to check if it's allowed */
int (*may_split)(struct vm_area_struct *area, unsigned long addr);
int (*mremap)(struct vm_area_struct *area);
@@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
swap(vma->vm_file, file);
fput(file);
}
+
+static inline void unmap_vma_locked(struct vm_area_struct *vma)
+{
+ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+ mmap_assert_locked(vma->vm_mm);
+ do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (3 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 04/15] mm: add vm_ops->mapped hook Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-13 11:07 ` Usama Arif
2026-03-12 20:27 ` [PATCH 06/15] mm: add mmap_action_simple_ioremap() Lorenzo Stoakes (Oracle)
` (10 subsequent siblings)
15 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
.mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
the deprecated mmap callback.
However, it did not account for the fact that mmap_prepare can fail to map
due to an out of memory error, and thus should not be incrementing a
reference count on mmap_prepare.
With the newly added vm_ops->mapped callback available, we can simply defer
this operation to that callback which is only invoked once the mapping is
successfully in place (but not yet visible to userspace as the mmap and VMA
write locks are held).
Therefore add afs_mapped() to implement this callback for AFS.
In practice the mapping allocations are 'too small to fail' so this is
something that realistically should never happen in practice (or would do
so in a case where the process is about to die anyway), but we should still
handle this.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
fs/afs/file.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/fs/afs/file.c b/fs/afs/file.c
index f609366fd2ac..69ef86f5e274 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
static void afs_vm_open(struct vm_area_struct *area);
static void afs_vm_close(struct vm_area_struct *area);
static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
+static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
+ const struct file *file, void **vm_private_data);
const struct file_operations afs_file_operations = {
.open = afs_open,
@@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
};
static const struct vm_operations_struct afs_vm_ops = {
+ .mapped = afs_mapped,
.open = afs_vm_open,
.close = afs_vm_close,
.fault = filemap_fault,
@@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
afs_add_open_mmap(vnode);
ret = generic_file_mmap_prepare(desc);
- if (ret == 0)
- desc->vm_ops = &afs_vm_ops;
- else
- afs_drop_open_mmap(vnode);
+ if (ret)
+ return ret;
+
+ desc->vm_ops = &afs_vm_ops;
return ret;
}
+static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
+ const struct file *file, void **vm_private_data)
+{
+ struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+
+ afs_add_open_mmap(vnode);
+ return 0;
+}
+
static void afs_vm_open(struct vm_area_struct *vma)
{
afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 06/15] mm: add mmap_action_simple_ioremap()
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (4 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 07/15] misc: open-dice: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
` (9 subsequent siblings)
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
Currently drivers use vm_iomap_memory() as a simple helper function for I/O
remapping memory over a range starting at a specified physical address over
a specified length.
In order to utilise this from mmap_prepare, separate out the core logic
into __simple_ioremap_prep(), update vm_iomap_memory() to use it, and add
simple_ioremap_prepare() to do the same with a VMA descriptor object.
We also add MMAP_SIMPLE_IO_REMAP and relevant fields to the struct
mmap_action type to permit this operation also.
We use mmap_action_ioremap() to set up the actual I/O remap operation once
we have checked and figured out the parameters, which makes
simple_ioremap_prepare() easy to implement.
We then add mmap_action_simple_ioremap() to allow drivers to make use of
this mode.
We update the mmap_prepare documentation to describe this mode.
Finally, we update the VMA tests to reflect this change.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
Documentation/filesystems/mmap_prepare.rst | 2 +
include/linux/mm.h | 24 +++++-
include/linux/mm_types.h | 6 +-
mm/internal.h | 3 +
mm/memory.c | 87 +++++++++++++++-------
mm/util.c | 12 +++
tools/testing/vma/include/dup.h | 6 +-
7 files changed, 112 insertions(+), 28 deletions(-)
diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
index 76908200f3a1..d21406848bca 100644
--- a/Documentation/filesystems/mmap_prepare.rst
+++ b/Documentation/filesystems/mmap_prepare.rst
@@ -126,6 +126,8 @@ pointer. These are:
* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
the entire mapping from `start_pfn` onward.
+* `mmap_action_simple_ioremap()` - Sets up an I/O remap from a specified
+ physical address and over a specified length.
**NOTE:** The 'action' field should never normally be manipulated directly,
rather you ought to use one of these helpers.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7333d5db1221..88f42faeb377 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4133,11 +4133,33 @@ static inline void mmap_action_ioremap(struct vm_area_desc *desc,
* @start_pfn: The first PFN in the range to remap.
*/
static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
- unsigned long start_pfn)
+ unsigned long start_pfn)
{
mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
}
+/**
+ * mmap_action_simple_ioremap - helper for mmap_prepare hook to specify that the
+ * physical range in [start_phys_addr, start_phys_addr + size) should be I/O
+ * remapped.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start_phys_addr: Start of the physical memory to be mapped.
+ * @size: Size of the area to map.
+ *
+ * NOTE: Some drivers might want to tweak desc->page_prot for purposes of
+ * write-combine or similar.
+ */
+static inline void mmap_action_simple_ioremap(struct vm_area_desc *desc,
+ phys_addr_t start_phys_addr,
+ unsigned long size)
+{
+ struct mmap_action *action = &desc->action;
+
+ action->simple_ioremap.start_phys_addr = start_phys_addr;
+ action->simple_ioremap.size = size;
+ action->type = MMAP_SIMPLE_IO_REMAP;
+}
+
int mmap_action_prepare(struct vm_area_desc *desc,
struct mmap_action *action);
int mmap_action_complete(struct vm_area_struct *vma,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3944b51ebac6..1c94db0fcfb4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -814,6 +814,7 @@ enum mmap_action_type {
MMAP_NOTHING, /* Mapping is complete, no further action. */
MMAP_REMAP_PFN, /* Remap PFN range. */
MMAP_IO_REMAP_PFN, /* I/O remap PFN range. */
+ MMAP_SIMPLE_IO_REMAP, /* I/O remap with guardrails. */
};
/*
@@ -822,13 +823,16 @@ enum mmap_action_type {
*/
struct mmap_action {
union {
- /* Remap range. */
struct {
unsigned long start;
unsigned long start_pfn;
unsigned long size;
pgprot_t pgprot;
} remap;
+ struct {
+ phys_addr_t start_phys_addr;
+ unsigned long size;
+ } simple_ioremap;
};
enum mmap_action_type type;
diff --git a/mm/internal.h b/mm/internal.h
index f0f2cf1caa36..2509fd952f4c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1789,6 +1789,9 @@ int remap_pfn_range_prepare(struct vm_area_desc *desc,
struct mmap_action *action);
int remap_pfn_range_complete(struct vm_area_struct *vma,
struct mmap_action *action);
+int simple_ioremap_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action);
+/* No simple_ioremap_complete, is ultimately handled by remap complete. */
static inline int io_remap_pfn_range_prepare(struct vm_area_desc *desc,
struct mmap_action *action)
diff --git a/mm/memory.c b/mm/memory.c
index 364fa8a45360..351cc917b7aa 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3170,6 +3170,59 @@ int remap_pfn_range_complete(struct vm_area_struct *vma,
return do_remap_pfn_range(vma, start, pfn, size, prot);
}
+static int __simple_ioremap_prep(unsigned long vm_start, unsigned long vm_end,
+ pgoff_t vm_pgoff, phys_addr_t start_phys,
+ unsigned long size, unsigned long *pfnp)
+{
+ const unsigned long vm_len = vm_end - vm_start;
+ unsigned long pfn, pages;
+
+ /* Check that the physical memory area passed in looks valid */
+ if (start_phys + size < start_phys)
+ return -EINVAL;
+ /*
+ * You *really* shouldn't map things that aren't page-aligned,
+ * but we've historically allowed it because IO memory might
+ * just have smaller alignment.
+ */
+ size += start_phys & ~PAGE_MASK;
+ pfn = start_phys >> PAGE_SHIFT;
+ pages = (size + ~PAGE_MASK) >> PAGE_SHIFT;
+ if (pfn + pages < pfn)
+ return -EINVAL;
+
+ /* We start the mapping 'vm_pgoff' pages into the area */
+ if (vm_pgoff > pages)
+ return -EINVAL;
+ pfn += vm_pgoff;
+ pages -= vm_pgoff;
+
+ /* Can we fit all of the mapping? */
+ if ((vm_len >> PAGE_SHIFT) > pages)
+ return -EINVAL;
+
+ *pfnp = pfn;
+ return 0;
+}
+
+int simple_ioremap_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action)
+{
+ const phys_addr_t start = action->simple_ioremap.start_phys_addr;
+ const unsigned long size = action->simple_ioremap.size;
+ unsigned long pfn;
+ int err;
+
+ err = __simple_ioremap_prep(desc->start, desc->end, desc->pgoff,
+ start, size, &pfn);
+ if (err)
+ return err;
+
+ /* The I/O remap logic does the heavy lifting. */
+ mmap_action_ioremap(desc, desc->start, pfn, vma_desc_size(desc));
+ return mmap_action_prepare(desc, &desc->action);
+}
+
/**
* vm_iomap_memory - remap memory to userspace
* @vma: user vma to map to
@@ -3187,32 +3240,16 @@ int remap_pfn_range_complete(struct vm_area_struct *vma,
*/
int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len)
{
- unsigned long vm_len, pfn, pages;
-
- /* Check that the physical memory area passed in looks valid */
- if (start + len < start)
- return -EINVAL;
- /*
- * You *really* shouldn't map things that aren't page-aligned,
- * but we've historically allowed it because IO memory might
- * just have smaller alignment.
- */
- len += start & ~PAGE_MASK;
- pfn = start >> PAGE_SHIFT;
- pages = (len + ~PAGE_MASK) >> PAGE_SHIFT;
- if (pfn + pages < pfn)
- return -EINVAL;
-
- /* We start the mapping 'vm_pgoff' pages into the area */
- if (vma->vm_pgoff > pages)
- return -EINVAL;
- pfn += vma->vm_pgoff;
- pages -= vma->vm_pgoff;
+ const unsigned long vm_start = vma->vm_start;
+ const unsigned long vm_end = vma->vm_end;
+ const unsigned long vm_len = vm_end - vm_start;
+ unsigned long pfn;
+ int err;
- /* Can we fit all of the mapping? */
- vm_len = vma->vm_end - vma->vm_start;
- if (vm_len >> PAGE_SHIFT > pages)
- return -EINVAL;
+ err = __simple_ioremap_prep(vm_start, vm_end, vma->vm_pgoff, start,
+ len, &pfn);
+ if (err)
+ return err;
/* Ok, let it rip */
return io_remap_pfn_range(vma, vma->vm_start, pfn, vm_len, vma->vm_page_prot);
diff --git a/mm/util.c b/mm/util.c
index 2b0ed54008d6..3205bb9ab5d2 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1394,6 +1394,8 @@ int mmap_action_prepare(struct vm_area_desc *desc,
return remap_pfn_range_prepare(desc, action);
case MMAP_IO_REMAP_PFN:
return io_remap_pfn_range_prepare(desc, action);
+ case MMAP_SIMPLE_IO_REMAP:
+ return simple_ioremap_prepare(desc, action);
}
}
EXPORT_SYMBOL(mmap_action_prepare);
@@ -1422,6 +1424,14 @@ int mmap_action_complete(struct vm_area_struct *vma,
case MMAP_IO_REMAP_PFN:
err = io_remap_pfn_range_complete(vma, action);
break;
+ case MMAP_SIMPLE_IO_REMAP:
+ /*
+ * The simple I/O remap should have been delegated to an I/O
+ * remap.
+ */
+ WARN_ON_ONCE(1);
+ err = -EINVAL;
+ break;
}
return mmap_action_finish(vma, action, err);
@@ -1436,6 +1446,7 @@ int mmap_action_prepare(struct vm_area_desc *desc,
break;
case MMAP_REMAP_PFN:
case MMAP_IO_REMAP_PFN:
+ case MMAP_SIMPLE_IO_REMAP:
WARN_ON_ONCE(1); /* nommu cannot handle these. */
break;
}
@@ -1454,6 +1465,7 @@ int mmap_action_complete(struct vm_area_struct *vma,
break;
case MMAP_REMAP_PFN:
case MMAP_IO_REMAP_PFN:
+ case MMAP_SIMPLE_IO_REMAP:
WARN_ON_ONCE(1); /* nommu cannot handle this. */
err = -EINVAL;
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index 47d8db809f31..f95c4b8af03c 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -424,6 +424,7 @@ enum mmap_action_type {
MMAP_NOTHING, /* Mapping is complete, no further action. */
MMAP_REMAP_PFN, /* Remap PFN range. */
MMAP_IO_REMAP_PFN, /* I/O remap PFN range. */
+ MMAP_SIMPLE_IO_REMAP, /* I/O remap with guardrails. */
};
/*
@@ -432,13 +433,16 @@ enum mmap_action_type {
*/
struct mmap_action {
union {
- /* Remap range. */
struct {
unsigned long start;
unsigned long start_pfn;
unsigned long size;
pgprot_t pgprot;
} remap;
+ struct {
+ phys_addr_t start;
+ unsigned long len;
+ } simple_ioremap;
};
enum mmap_action_type type;
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 07/15] misc: open-dice: replace deprecated mmap hook with mmap_prepare
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (5 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 06/15] mm: add mmap_action_simple_ioremap() Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 08/15] hpet: " Lorenzo Stoakes (Oracle)
` (8 subsequent siblings)
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
The f_op->mmap interface is deprecated, so update driver to use its
successor, mmap_prepare.
The driver previously used vm_iomap_memory(), so this change replaces it
with its mmap_prepare equivalent, mmap_action_simple_ioremap().
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
drivers/misc/open-dice.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/misc/open-dice.c b/drivers/misc/open-dice.c
index 24c29e0f00ef..45060fb4ea27 100644
--- a/drivers/misc/open-dice.c
+++ b/drivers/misc/open-dice.c
@@ -86,29 +86,32 @@ static ssize_t open_dice_write(struct file *filp, const char __user *ptr,
/*
* Creates a mapping of the reserved memory region in user address space.
*/
-static int open_dice_mmap(struct file *filp, struct vm_area_struct *vma)
+static int open_dice_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *filp = desc->file;
struct open_dice_drvdata *drvdata = to_open_dice_drvdata(filp);
- if (vma->vm_flags & VM_MAYSHARE) {
+ if (vma_desc_test(desc, VMA_MAYSHARE_BIT)) {
/* Do not allow userspace to modify the underlying data. */
- if (vma->vm_flags & VM_WRITE)
+ if (vma_desc_test(desc, VMA_WRITE_BIT))
return -EPERM;
/* Ensure userspace cannot acquire VM_WRITE later. */
- vm_flags_clear(vma, VM_MAYWRITE);
+ vma_desc_clear_flags(desc, VMA_MAYWRITE_BIT);
}
/* Create write-combine mapping so all clients observe a wipe. */
- vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
- vm_flags_set(vma, VM_DONTCOPY | VM_DONTDUMP);
- return vm_iomap_memory(vma, drvdata->rmem->base, drvdata->rmem->size);
+ desc->page_prot = pgprot_writecombine(desc->page_prot);
+ vma_desc_set_flags(desc, VMA_DONTCOPY_BIT, VMA_DONTDUMP_BIT);
+ mmap_action_simple_ioremap(desc, drvdata->rmem->base,
+ drvdata->rmem->size);
+ return 0;
}
static const struct file_operations open_dice_fops = {
.owner = THIS_MODULE,
.read = open_dice_read,
.write = open_dice_write,
- .mmap = open_dice_mmap,
+ .mmap_prepare = open_dice_mmap_prepare,
};
static int __init open_dice_probe(struct platform_device *pdev)
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 08/15] hpet: replace deprecated mmap hook with mmap_prepare
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (6 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 07/15] misc: open-dice: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 09/15] mtdchar: replace deprecated mmap hook with mmap_prepare, clean up Lorenzo Stoakes (Oracle)
` (7 subsequent siblings)
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
The f_op->mmap interface is deprecated, so update driver to use its
successor, mmap_prepare.
The driver previously used vm_iomap_memory(), so this change replaces it
with its mmap_prepare equivalent, mmap_action_simple_ioremap().
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
drivers/char/hpet.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 60dd09a56f50..8f128cc40147 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -354,8 +354,9 @@ static __init int hpet_mmap_enable(char *str)
}
__setup("hpet_mmap=", hpet_mmap_enable);
-static int hpet_mmap(struct file *file, struct vm_area_struct *vma)
+static int hpet_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *file = desc->file;
struct hpet_dev *devp;
unsigned long addr;
@@ -368,11 +369,12 @@ static int hpet_mmap(struct file *file, struct vm_area_struct *vma)
if (addr & (PAGE_SIZE - 1))
return -ENOSYS;
- vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
- return vm_iomap_memory(vma, addr, PAGE_SIZE);
+ desc->page_prot = pgprot_noncached(desc->page_prot);
+ mmap_action_simple_ioremap(desc, addr, PAGE_SIZE);
+ return 0;
}
#else
-static int hpet_mmap(struct file *file, struct vm_area_struct *vma)
+static int hpet_mmap_prepare(struct vm_area_desc *desc)
{
return -ENOSYS;
}
@@ -710,7 +712,7 @@ static const struct file_operations hpet_fops = {
.open = hpet_open,
.release = hpet_release,
.fasync = hpet_fasync,
- .mmap = hpet_mmap,
+ .mmap_prepare = hpet_mmap_prepare,
};
static int hpet_is_known(struct hpet_data *hdp)
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 09/15] mtdchar: replace deprecated mmap hook with mmap_prepare, clean up
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (7 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 08/15] hpet: " Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 10/15] stm: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
` (6 subsequent siblings)
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
Replace the deprecated mmap callback with mmap_prepare.
Commit f5cf8f07423b ("mtd: Disable mtdchar mmap on MMU systems") commented
out the CONFIG_MMU part of this function back in 2012, so after ~14 years
it's probably reasonable to remove this altogether rather than updating
dead code.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
drivers/mtd/mtdchar.c | 21 +++------------------
1 file changed, 3 insertions(+), 18 deletions(-)
diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
index 55a43682c567..816ab1ae8b8d 100644
--- a/drivers/mtd/mtdchar.c
+++ b/drivers/mtd/mtdchar.c
@@ -1376,27 +1376,12 @@ static unsigned mtdchar_mmap_capabilities(struct file *file)
/*
* set up a mapping for shared memory segments
*/
-static int mtdchar_mmap(struct file *file, struct vm_area_struct *vma)
+static int mtdchar_mmap_prepare(struct vm_area_desc *desc)
{
#ifdef CONFIG_MMU
- struct mtd_file_info *mfi = file->private_data;
- struct mtd_info *mtd = mfi->mtd;
- struct map_info *map = mtd->priv;
-
- /* This is broken because it assumes the MTD device is map-based
- and that mtd->priv is a valid struct map_info. It should be
- replaced with something that uses the mtd_get_unmapped_area()
- operation properly. */
- if (0 /*mtd->type == MTD_RAM || mtd->type == MTD_ROM*/) {
-#ifdef pgprot_noncached
- if (file->f_flags & O_DSYNC || map->phys >= __pa(high_memory))
- vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-#endif
- return vm_iomap_memory(vma, map->phys, map->size);
- }
return -ENODEV;
#else
- return vma->vm_flags & VM_SHARED ? 0 : -EACCES;
+ return vma_desc_test_flags(desc, VMA_SHARED_BIT) ? 0 : -EACCES;
#endif
}
@@ -1411,7 +1396,7 @@ static const struct file_operations mtd_fops = {
#endif
.open = mtdchar_open,
.release = mtdchar_close,
- .mmap = mtdchar_mmap,
+ .mmap_prepare = mtdchar_mmap_prepare,
#ifndef CONFIG_MMU
.get_unmapped_area = mtdchar_get_unmapped_area,
.mmap_capabilities = mtdchar_mmap_capabilities,
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 10/15] stm: replace deprecated mmap hook with mmap_prepare
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (8 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 09/15] mtdchar: replace deprecated mmap hook with mmap_prepare, clean up Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 11/15] staging: vme_user: " Lorenzo Stoakes (Oracle)
` (5 subsequent siblings)
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
The f_op->mmap interface is deprecated, so update driver to use its
successor, mmap_prepare.
The driver previously used vm_iomap_memory(), so this change replaces it
with its mmap_prepare equivalent, mmap_action_simple_ioremap().
Also, in order to correctly maintain reference counting, add a
vm_ops->mapped callback to increment the reference count when successfully
mapped.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
drivers/hwtracing/stm/core.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index 37584e786bb5..f48c6a8a0654 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -666,6 +666,16 @@ static ssize_t stm_char_write(struct file *file, const char __user *buf,
return count;
}
+static int stm_mmap_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
+ const struct file *file, void **vm_private_data)
+{
+ struct stm_file *stmf = file->private_data;
+ struct stm_device *stm = stmf->stm;
+
+ pm_runtime_get_sync(&stm->dev);
+ return 0;
+}
+
static void stm_mmap_open(struct vm_area_struct *vma)
{
struct stm_file *stmf = vma->vm_file->private_data;
@@ -684,12 +694,14 @@ static void stm_mmap_close(struct vm_area_struct *vma)
}
static const struct vm_operations_struct stm_mmap_vmops = {
+ .mapped = stm_mmap_mapped,
.open = stm_mmap_open,
.close = stm_mmap_close,
};
-static int stm_char_mmap(struct file *file, struct vm_area_struct *vma)
+static int stm_char_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *file = desc->file;
struct stm_file *stmf = file->private_data;
struct stm_device *stm = stmf->stm;
unsigned long size, phys;
@@ -697,10 +709,10 @@ static int stm_char_mmap(struct file *file, struct vm_area_struct *vma)
if (!stm->data->mmio_addr)
return -EOPNOTSUPP;
- if (vma->vm_pgoff)
+ if (desc->pgoff)
return -EINVAL;
- size = vma->vm_end - vma->vm_start;
+ size = vma_desc_size(desc);
if (stmf->output.nr_chans * stm->data->sw_mmiosz != size)
return -EINVAL;
@@ -712,13 +724,12 @@ static int stm_char_mmap(struct file *file, struct vm_area_struct *vma)
if (!phys)
return -EINVAL;
- pm_runtime_get_sync(&stm->dev);
-
- vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
- vm_flags_set(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
- vma->vm_ops = &stm_mmap_vmops;
- vm_iomap_memory(vma, phys, size);
+ desc->page_prot = pgprot_noncached(desc->page_prot);
+ vma_desc_set_flags(desc, VMA_IO_BIT, VMA_DONTEXPAND_BIT,
+ VMA_DONTDUMP_BIT);
+ desc->vm_ops = &stm_mmap_vmops;
+ mmap_action_simple_ioremap(desc, phys, size);
return 0;
}
@@ -836,7 +847,7 @@ static const struct file_operations stm_fops = {
.open = stm_char_open,
.release = stm_char_release,
.write = stm_char_write,
- .mmap = stm_char_mmap,
+ .mmap_prepare = stm_char_mmap_prepare,
.unlocked_ioctl = stm_char_ioctl,
.compat_ioctl = compat_ptr_ioctl,
};
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 11/15] staging: vme_user: replace deprecated mmap hook with mmap_prepare
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (9 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 10/15] stm: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 12/15] mm: allow handling of stacked mmap_prepare hooks in more drivers Lorenzo Stoakes (Oracle)
` (4 subsequent siblings)
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
The f_op->mmap interface is deprecated, so update driver to use its
successor, mmap_prepare.
The driver previously used vm_iomap_memory(), so this change replaces it
with its mmap_prepare equivalent, mmap_action_simple_ioremap().
Functions that wrap mmap() are also converted to wrap mmap_prepare()
instead.
Also update the documentation accordingly.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
Documentation/driver-api/vme.rst | 2 +-
drivers/staging/vme_user/vme.c | 20 +++++------
drivers/staging/vme_user/vme.h | 2 +-
drivers/staging/vme_user/vme_user.c | 51 +++++++++++++++++------------
4 files changed, 42 insertions(+), 33 deletions(-)
diff --git a/Documentation/driver-api/vme.rst b/Documentation/driver-api/vme.rst
index c0b475369de0..7111999abc14 100644
--- a/Documentation/driver-api/vme.rst
+++ b/Documentation/driver-api/vme.rst
@@ -107,7 +107,7 @@ The function :c:func:`vme_master_read` can be used to read from and
In addition to simple reads and writes, :c:func:`vme_master_rmw` is provided to
do a read-modify-write transaction. Parts of a VME window can also be mapped
-into user space memory using :c:func:`vme_master_mmap`.
+into user space memory using :c:func:`vme_master_mmap_prepare`.
Slave windows
diff --git a/drivers/staging/vme_user/vme.c b/drivers/staging/vme_user/vme.c
index f10a00c05f12..7220aba7b919 100644
--- a/drivers/staging/vme_user/vme.c
+++ b/drivers/staging/vme_user/vme.c
@@ -735,9 +735,9 @@ unsigned int vme_master_rmw(struct vme_resource *resource, unsigned int mask,
EXPORT_SYMBOL(vme_master_rmw);
/**
- * vme_master_mmap - Mmap region of VME master window.
+ * vme_master_mmap_prepare - Mmap region of VME master window.
* @resource: Pointer to VME master resource.
- * @vma: Pointer to definition of user mapping.
+ * @desc: Pointer to descriptor of user mapping.
*
* Memory map a region of the VME master window into user space.
*
@@ -745,12 +745,13 @@ EXPORT_SYMBOL(vme_master_rmw);
* resource or -EFAULT if map exceeds window size. Other generic mmap
* errors may also be returned.
*/
-int vme_master_mmap(struct vme_resource *resource, struct vm_area_struct *vma)
+int vme_master_mmap_prepare(struct vme_resource *resource,
+ struct vm_area_desc *desc)
{
+ const unsigned long vma_size = vma_desc_size(desc);
struct vme_bridge *bridge = find_bridge(resource);
struct vme_master_resource *image;
phys_addr_t phys_addr;
- unsigned long vma_size;
if (resource->type != VME_MASTER) {
dev_err(bridge->parent, "Not a master resource\n");
@@ -758,19 +759,18 @@ int vme_master_mmap(struct vme_resource *resource, struct vm_area_struct *vma)
}
image = list_entry(resource->entry, struct vme_master_resource, list);
- phys_addr = image->bus_resource.start + (vma->vm_pgoff << PAGE_SHIFT);
- vma_size = vma->vm_end - vma->vm_start;
+ phys_addr = image->bus_resource.start + (desc->pgoff << PAGE_SHIFT);
if (phys_addr + vma_size > image->bus_resource.end + 1) {
dev_err(bridge->parent, "Map size cannot exceed the window size\n");
return -EFAULT;
}
- vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-
- return vm_iomap_memory(vma, phys_addr, vma->vm_end - vma->vm_start);
+ desc->page_prot = pgprot_noncached(desc->page_prot);
+ mmap_action_simple_ioremap(desc, phys_addr, vma_size);
+ return 0;
}
-EXPORT_SYMBOL(vme_master_mmap);
+EXPORT_SYMBOL(vme_master_mmap_prepare);
/**
* vme_master_free - Free VME master window
diff --git a/drivers/staging/vme_user/vme.h b/drivers/staging/vme_user/vme.h
index 797e9940fdd1..b6413605ea49 100644
--- a/drivers/staging/vme_user/vme.h
+++ b/drivers/staging/vme_user/vme.h
@@ -151,7 +151,7 @@ ssize_t vme_master_read(struct vme_resource *resource, void *buf, size_t count,
ssize_t vme_master_write(struct vme_resource *resource, void *buf, size_t count, loff_t offset);
unsigned int vme_master_rmw(struct vme_resource *resource, unsigned int mask, unsigned int compare,
unsigned int swap, loff_t offset);
-int vme_master_mmap(struct vme_resource *resource, struct vm_area_struct *vma);
+int vme_master_mmap_prepare(struct vme_resource *resource, struct vm_area_desc *desc);
void vme_master_free(struct vme_resource *resource);
struct vme_resource *vme_dma_request(struct vme_dev *vdev, u32 route);
diff --git a/drivers/staging/vme_user/vme_user.c b/drivers/staging/vme_user/vme_user.c
index d95dd7d9190a..11e25c2f6b0a 100644
--- a/drivers/staging/vme_user/vme_user.c
+++ b/drivers/staging/vme_user/vme_user.c
@@ -446,24 +446,14 @@ static void vme_user_vm_close(struct vm_area_struct *vma)
kfree(vma_priv);
}
-static const struct vm_operations_struct vme_user_vm_ops = {
- .open = vme_user_vm_open,
- .close = vme_user_vm_close,
-};
-
-static int vme_user_master_mmap(unsigned int minor, struct vm_area_struct *vma)
+static int vme_user_vm_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
+ const struct file *file, void **vm_private_data)
{
- int err;
+ const unsigned int minor = iminor(file_inode(file));
struct vme_user_vma_priv *vma_priv;
mutex_lock(&image[minor].mutex);
- err = vme_master_mmap(image[minor].resource, vma);
- if (err) {
- mutex_unlock(&image[minor].mutex);
- return err;
- }
-
vma_priv = kmalloc_obj(*vma_priv);
if (!vma_priv) {
mutex_unlock(&image[minor].mutex);
@@ -472,22 +462,41 @@ static int vme_user_master_mmap(unsigned int minor, struct vm_area_struct *vma)
vma_priv->minor = minor;
refcount_set(&vma_priv->refcnt, 1);
- vma->vm_ops = &vme_user_vm_ops;
- vma->vm_private_data = vma_priv;
-
+ *vm_private_data = vma_priv;
image[minor].mmap_count++;
mutex_unlock(&image[minor].mutex);
-
return 0;
}
-static int vme_user_mmap(struct file *file, struct vm_area_struct *vma)
+static const struct vm_operations_struct vme_user_vm_ops = {
+ .mapped = vme_user_vm_mapped,
+ .open = vme_user_vm_open,
+ .close = vme_user_vm_close,
+};
+
+static int vme_user_master_mmap_prepare(unsigned int minor,
+ struct vm_area_desc *desc)
+{
+ int err;
+
+ mutex_lock(&image[minor].mutex);
+
+ err = vme_master_mmap_prepare(image[minor].resource, desc);
+ if (!err)
+ desc->vm_ops = &vme_user_vm_ops;
+
+ mutex_unlock(&image[minor].mutex);
+ return err;
+}
+
+static int vme_user_mmap_prepare(struct vm_area_desc *desc)
{
- unsigned int minor = iminor(file_inode(file));
+ const struct file *file = desc->file;
+ const unsigned int minor = iminor(file_inode(file));
if (type[minor] == MASTER_MINOR)
- return vme_user_master_mmap(minor, vma);
+ return vme_user_master_mmap_prepare(minor, desc);
return -ENODEV;
}
@@ -498,7 +507,7 @@ static const struct file_operations vme_user_fops = {
.llseek = vme_user_llseek,
.unlocked_ioctl = vme_user_unlocked_ioctl,
.compat_ioctl = compat_ptr_ioctl,
- .mmap = vme_user_mmap,
+ .mmap_prepare = vme_user_mmap_prepare,
};
static int vme_user_match(struct vme_dev *vdev)
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 12/15] mm: allow handling of stacked mmap_prepare hooks in more drivers
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (10 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 11/15] staging: vme_user: " Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 13/15] drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
` (3 subsequent siblings)
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
While the conversion of mmap hooks to mmap_prepare is underway, we wil
encounter situations where mmap hooks need to invoke nested mmap_prepare
hooks.
The nesting of mmap hooks is termed 'stacking'. In order to flexibly
facilitate the conversion of custom mmap hooks in drivers which stack, we
must split up the existing compat_vma_mapped() function into two separate
functions:
* compat_set_desc_from_vma() - This allows the setting of a vm_area_desc
object's fields to the relevant fields of a VMA.
* __compat_vma_mmap() - Once an mmap_prepare hook has been executed upon a
vm_area_desc object, this function performs any mmap actions specified by
the mmap_prepare hook and then invokes its vm_ops->mapped() hook if any
were specified.
In ordinary cases, where a file's f_op->mmap_prepare() hook simply needs to
be invoked in a stacked mmap() hook, compat_vma_mmap() can be used.
However some drivers define their own nested hooks, which are invoked in
turn by another hook.
A concrete example is vmbus_channel->mmap_ring_buffer(), which is invoked
in turn by bin_attribute->mmap():
vmbus_channel->mmap_ring_buffer() has a signature of:
int (*mmap_ring_buffer)(struct vmbus_channel *channel,
struct vm_area_struct *vma);
And bin_attribute->mmap() has a signature of:
int (*mmap)(struct file *, struct kobject *,
const struct bin_attribute *attr,
struct vm_area_struct *vma);
And so compat_vma_mmap() cannot be used here for incremental conversion of
hooks from mmap() to mmap_prepare().
There are many such instances like this, where conversion to mmap_prepare
would otherwise cascade to a huge change set due to nesting of this kind.
The changes in this patch mean we could now instead convert
vmbus_channel->mmap_ring_buffer() to
vmbus_channel->mmap_prepare_ring_buffer(), and implement something like:
struct vm_area_desc desc;
int err;
compat_set_desc_from_vm(&desc, file, vma);
err = channel->mmap_prepare_ring_buffer(channel, &desc);
if (err)
return err;
return __compat_vma_mmap(&desc, vma);
Allowing us to incrementally update this logic, and other logic like it.
Unfortunately, as part of this change, we need to be able to flexibly
assign to the VMA descriptor, so have to remove some of the const
declarations within the structure.
Also update the VMA tests to reflect the changes.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
include/linux/fs.h | 3 +
include/linux/mm_types.h | 4 +-
mm/util.c | 111 +++++++++++++++++++++++---------
mm/vma.h | 2 +-
tools/testing/vma/include/dup.h | 111 ++++++++++++++++++++------------
5 files changed, 157 insertions(+), 74 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c390f5c667e3..0bdccfa70b44 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2058,6 +2058,9 @@ static inline bool can_mmap_file(struct file *file)
return true;
}
+void compat_set_desc_from_vma(struct vm_area_desc *desc, const struct file *file,
+ const struct vm_area_struct *vma);
+int __compat_vma_mmap(struct vm_area_desc *desc, struct vm_area_struct *vma);
int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
int __vma_check_mmap_hook(struct vm_area_struct *vma);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1c94db0fcfb4..316bb0adf91d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -883,8 +883,8 @@ typedef struct {
*/
struct vm_area_desc {
/* Immutable state. */
- const struct mm_struct *const mm;
- struct file *const file; /* May vary from vm_file in stacked callers. */
+ struct mm_struct *mm;
+ struct file *file; /* May vary from vm_file in stacked callers. */
unsigned long start;
unsigned long end;
diff --git a/mm/util.c b/mm/util.c
index 3205bb9ab5d2..e739d7c0311c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1163,34 +1163,38 @@ void flush_dcache_folio(struct folio *folio)
EXPORT_SYMBOL(flush_dcache_folio);
#endif
-static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
+/**
+ * compat_set_desc_from_vma() - assigns VMA descriptor @desc fields from a VMA.
+ * @desc: A VMA descriptor whose fields need to be set.
+ * @file: The file object describing the file being mmap()'d.
+ * @vma: The VMA whose fields we wish to assign to @desc.
+ *
+ * This is a compatibility function to allow an mmap() hook to call
+ * mmap_prepare() hooks when drivers nest these. This function specifically
+ * allows the construction of a vm_area_desc value, @desc, from a VMA @vma for
+ * the purposes of doing this.
+ *
+ * Once the conversion of drivers is complete this function will no longer be
+ * required and will be removed.
+ */
+void compat_set_desc_from_vma(struct vm_area_desc *desc,
+ const struct file *file,
+ const struct vm_area_struct *vma)
{
- struct vm_area_desc desc = {
- .mm = vma->vm_mm,
- .file = file,
- .start = vma->vm_start,
- .end = vma->vm_end,
-
- .pgoff = vma->vm_pgoff,
- .vm_file = vma->vm_file,
- .vma_flags = vma->flags,
- .page_prot = vma->vm_page_prot,
-
- .action.type = MMAP_NOTHING, /* Default */
- };
- int err;
+ desc->mm = vma->vm_mm;
+ desc->file = (struct file *)file;
+ desc->start = vma->vm_start;
+ desc->end = vma->vm_end;
- err = vfs_mmap_prepare(file, &desc);
- if (err)
- return err;
+ desc->pgoff = vma->vm_pgoff;
+ desc->vm_file = vma->vm_file;
+ desc->vma_flags = vma->flags;
+ desc->page_prot = vma->vm_page_prot;
- err = mmap_action_prepare(&desc, &desc.action);
- if (err)
- return err;
-
- set_vma_from_desc(vma, &desc);
- return mmap_action_complete(vma, &desc.action);
+ /* Default. */
+ desc->action.type = MMAP_NOTHING;
}
+EXPORT_SYMBOL(compat_set_desc_from_vma);
static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
{
@@ -1212,6 +1216,49 @@ static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
return err;
}
+/**
+ * __compat_vma_mmap() - Similar to compat_vma_mmap(), only it allows
+ * flexibility as to how the mmap_prepare callback is invoked, which is useful
+ * for drivers which invoke nested mmap_prepare callbacks in an mmap() hook.
+ * @desc: A VMA descriptor upon which an mmap_prepare() hook has already been
+ * executed.
+ * @vma: The VMA to which @desc should be applied.
+ *
+ * The function assumes that you have obtained a VMA descriptor @desc from
+ * compt_set_desc_from_vma(), and already executed the mmap_prepare() hook upon
+ * it.
+ *
+ * It then performs any specified mmap actions, and invokes the vm_ops->mapped()
+ * hook if one is present.
+ *
+ * See the description of compat_vma_mmap() for more details.
+ *
+ * Once the conversion of drivers is complete this function will no longer be
+ * required and will be removed.
+ *
+ * Returns: 0 on success or error.
+ */
+int __compat_vma_mmap(struct vm_area_desc *desc,
+ struct vm_area_struct *vma)
+{
+ int err;
+
+ /* Perform any preparatory tasks for mmap action. */
+ err = mmap_action_prepare(desc, &desc->action);
+ if (err)
+ return err;
+ /* Update the VMA from the descriptor. */
+ compat_set_vma_from_desc(vma, desc);
+ /* Complete any specified mmap actions. */
+ err = mmap_action_complete(vma, &desc->action);
+ if (err)
+ return err;
+
+ /* Invoke vm_ops->mapped callback. */
+ return __compat_vma_mapped(desc->file, vma);
+}
+EXPORT_SYMBOL(__compat_vma_mmap);
+
/**
* compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
* existing VMA and execute any requested actions.
@@ -1219,10 +1266,10 @@ static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
* @vma: The VMA to apply the .mmap_prepare() hook to.
*
* Ordinarily, .mmap_prepare() is invoked directly upon mmap(). However, certain
- * stacked filesystems invoke a nested mmap hook of an underlying file.
+ * stacked drivers invoke a nested mmap hook of an underlying file.
*
- * Until all filesystems are converted to use .mmap_prepare(), we must be
- * conservative and continue to invoke these stacked filesystems using the
+ * Until all drivers are converted to use .mmap_prepare(), we must be
+ * conservative and continue to invoke these stacked drivers using the
* deprecated .mmap() hook.
*
* However we have a problem if the underlying file system possesses an
@@ -1233,20 +1280,22 @@ static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
* establishes a struct vm_area_desc descriptor, passes to the underlying
* .mmap_prepare() hook and applies any changes performed by it.
*
- * Once the conversion of filesystems is complete this function will no longer
- * be required and will be removed.
+ * Once the conversion of drivers is complete this function will no longer be
+ * required and will be removed.
*
* Returns: 0 on success or error.
*/
int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
{
+ struct vm_area_desc desc;
int err;
- err = __compat_vma_mmap(file, vma);
+ compat_set_desc_from_vma(&desc, file, vma);
+ err = vfs_mmap_prepare(file, &desc);
if (err)
return err;
- return __compat_vma_mapped(file, vma);
+ return __compat_vma_mmap(&desc, vma);
}
EXPORT_SYMBOL(compat_vma_mmap);
diff --git a/mm/vma.h b/mm/vma.h
index eba388c61ef4..4a8dc5d15d47 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -296,7 +296,7 @@ static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
* f_op->mmap() but which might have an underlying file system which implements
* f_op->mmap_prepare().
*/
-static inline void set_vma_from_desc(struct vm_area_struct *vma,
+static inline void compat_set_vma_from_desc(struct vm_area_struct *vma,
struct vm_area_desc *desc)
{
/*
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index f95c4b8af03c..4f2c9bb6b1ea 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -490,8 +490,8 @@ enum vma_operation {
*/
struct vm_area_desc {
/* Immutable state. */
- const struct mm_struct *const mm;
- struct file *const file; /* May vary from vm_file in stacked callers. */
+ struct mm_struct *mm;
+ struct file *file; /* May vary from vm_file in stacked callers. */
unsigned long start;
unsigned long end;
@@ -1118,43 +1118,92 @@ static inline void vma_set_anonymous(struct vm_area_struct *vma)
}
/* Declared in vma.h. */
-static inline void set_vma_from_desc(struct vm_area_struct *vma,
+static inline void compat_set_vma_from_desc(struct vm_area_struct *vma,
struct vm_area_desc *desc);
-static inline int __compat_vma_mmap(const struct file_operations *f_op,
- struct file *file, struct vm_area_struct *vma)
+static inline void compat_set_desc_from_vma(struct vm_area_desc *desc,
+ const struct file *file,
+ const struct vm_area_struct *vma)
{
- struct vm_area_desc desc = {
- .mm = vma->vm_mm,
- .file = file,
- .start = vma->vm_start,
- .end = vma->vm_end,
+ desc->mm = vma->vm_mm;
+ desc->file = (struct file *)file;
+ desc->start = vma->vm_start;
+ desc->end = vma->vm_end;
- .pgoff = vma->vm_pgoff,
- .vm_file = vma->vm_file,
- .vma_flags = vma->flags,
- .page_prot = vma->vm_page_prot,
+ desc->pgoff = vma->vm_pgoff;
+ desc->vm_file = vma->vm_file;
+ desc->vma_flags = vma->flags;
+ desc->page_prot = vma->vm_page_prot;
- .action.type = MMAP_NOTHING, /* Default */
- };
+ /* Default. */
+ desc->action.type = MMAP_NOTHING;
+}
+
+static inline unsigned long vma_pages(const struct vm_area_struct *vma)
+{
+ return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+}
+
+static inline void unmap_vma_locked(struct vm_area_struct *vma)
+{
+ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+ mmap_assert_locked(vma->vm_mm);
+ do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
+}
+
+static inline int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
+{
+ const struct vm_operations_struct *vm_ops = vma->vm_ops;
int err;
- err = f_op->mmap_prepare(&desc);
+ if (!vm_ops->mapped)
+ return 0;
+
+ err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
+ &vma->vm_private_data);
if (err)
- return err;
+ unmap_vma_locked(vma);
+ return err;
+}
- err = mmap_action_prepare(&desc, &desc.action);
+static inline int __compat_vma_mmap(struct vm_area_desc *desc,
+ struct vm_area_struct *vma)
+{
+ int err;
+
+ /* Perform any preparatory tasks for mmap action. */
+ err = mmap_action_prepare(desc, &desc->action);
+ if (err)
+ return err;
+ /* Update the VMA from the descriptor. */
+ compat_set_vma_from_desc(vma, desc);
+ /* Complete any specified mmap actions. */
+ err = mmap_action_complete(vma, &desc->action);
if (err)
return err;
- set_vma_from_desc(vma, &desc);
- return mmap_action_complete(vma, &desc.action);
+ /* Invoke vm_ops->mapped callback. */
+ return __compat_vma_mapped(desc->file, vma);
+}
+
+static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
+{
+ return file->f_op->mmap_prepare(desc);
}
static inline int compat_vma_mmap(struct file *file,
struct vm_area_struct *vma)
{
- return __compat_vma_mmap(file->f_op, file, vma);
+ struct vm_area_desc desc;
+ int err;
+
+ compat_set_desc_from_vma(&desc, file, vma);
+ err = vfs_mmap_prepare(file, &desc);
+ if (err)
+ return err;
+
+ return __compat_vma_mmap(&desc, vma);
}
@@ -1164,11 +1213,6 @@ static inline void vma_iter_init(struct vma_iterator *vmi,
mas_init(&vmi->mas, &mm->mm_mt, addr);
}
-static inline unsigned long vma_pages(struct vm_area_struct *vma)
-{
- return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-}
-
static inline void mmap_assert_locked(struct mm_struct *);
static inline struct vm_area_struct *find_vma_intersection(struct mm_struct *mm,
unsigned long start_addr,
@@ -1359,11 +1403,6 @@ static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
return file->f_op->mmap(file, vma);
}
-static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
-{
- return file->f_op->mmap_prepare(desc);
-}
-
static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
{
/* Changing an anonymous vma with this is illegal */
@@ -1371,11 +1410,3 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
swap(vma->vm_file, file);
fput(file);
}
-
-static inline void unmap_vma_locked(struct vm_area_struct *vma)
-{
- const size_t len = vma_pages(vma) << PAGE_SHIFT;
-
- mmap_assert_locked(vma->vm_mm);
- do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
-}
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 13/15] drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (11 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 12/15] mm: allow handling of stacked mmap_prepare hooks in more drivers Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 14/15] uio: replace deprecated mmap hook with mmap_prepare in uio_info Lorenzo Stoakes (Oracle)
` (2 subsequent siblings)
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
The f_op->mmap interface is deprecated, so update the vmbus driver to use
its successor, mmap_prepare.
This updates all callbacks which referenced the function pointer
hv_mmap_ring_buffer to instead reference hv_mmap_prepare_ring_buffer,
utilising the newly introduced compat_set_desc_from_vma() and
__compat_vma_mmap() to be able to implement this change.
The UIO HV generic driver is the only user of hv_create_ring_sysfs(), which
is the only function which references
vmbus_channel->mmap_prepare_ring_buffer which, in turn, is the only
external interface to hv_mmap_prepare_ring_buffer.
This patch therefore updates this caller to use mmap_prepare instead, which
also previously used vm_iomap_memory(), so this change replaces it
with its mmap_prepare equivalent, mmap_action_simple_ioremap().
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
drivers/hv/hyperv_vmbus.h | 4 ++--
drivers/hv/vmbus_drv.c | 27 +++++++++++++++++----------
drivers/uio/uio_hv_generic.c | 11 ++++++-----
include/linux/hyperv.h | 4 ++--
4 files changed, 27 insertions(+), 19 deletions(-)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 7bd8f8486e85..31f576464f18 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -545,8 +545,8 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
int hv_create_ring_sysfs(struct vmbus_channel *channel,
- int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
- struct vm_area_struct *vma));
+ int (*hv_mmap_prepare_ring_buffer)(struct vmbus_channel *channel,
+ struct vm_area_desc *desc));
int hv_remove_ring_sysfs(struct vmbus_channel *channel);
#endif /* _HYPERV_VMBUS_H */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index bc4fc1951ae1..a76fa3f0588c 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1951,12 +1951,19 @@ static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
struct vm_area_struct *vma)
{
struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
+ struct vm_area_desc desc;
+ int err;
/*
* hv_(create|remove)_ring_sysfs implementation ensures that mmap_ring_buffer
* is not NULL.
*/
- return channel->mmap_ring_buffer(channel, vma);
+ compat_set_desc_from_vma(&desc, filp, vma);
+ err = channel->mmap_prepare_ring_buffer(channel, &desc);
+ if (err)
+ return err;
+
+ return __compat_vma_mmap(&desc, vma);
}
static struct bin_attribute chan_attr_ring_buffer = {
@@ -2048,13 +2055,13 @@ static const struct kobj_type vmbus_chan_ktype = {
/**
* hv_create_ring_sysfs() - create "ring" sysfs entry corresponding to ring buffers for a channel.
* @channel: Pointer to vmbus_channel structure
- * @hv_mmap_ring_buffer: function pointer for initializing the function to be called on mmap of
+ * @hv_mmap_ring_buffer: function pointer for initializing the function to be called on mmap
* channel's "ring" sysfs node, which is for the ring buffer of that channel.
* Function pointer is of below type:
- * int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
- * struct vm_area_struct *vma))
- * This has a pointer to the channel and a pointer to vm_area_struct,
- * used for mmap, as arguments.
+ * int (*hv_mmap_prepare_ring_buffer)(struct vmbus_channel *channel,
+ * struct vm_area_desc *desc))
+ * This has a pointer to the channel and a pointer to vm_area_desc,
+ * used for mmap_prepare, as arguments.
*
* Sysfs node for ring buffer of a channel is created along with other fields, however its
* visibility is disabled by default. Sysfs creation needs to be controlled when the use-case
@@ -2071,12 +2078,12 @@ static const struct kobj_type vmbus_chan_ktype = {
* Returns 0 on success or error code on failure.
*/
int hv_create_ring_sysfs(struct vmbus_channel *channel,
- int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
- struct vm_area_struct *vma))
+ int (*hv_mmap_prepare_ring_buffer)(struct vmbus_channel *channel,
+ struct vm_area_desc *desc))
{
struct kobject *kobj = &channel->kobj;
- channel->mmap_ring_buffer = hv_mmap_ring_buffer;
+ channel->mmap_prepare_ring_buffer = hv_mmap_prepare_ring_buffer;
channel->ring_sysfs_visible = true;
return sysfs_update_group(kobj, &vmbus_chan_group);
@@ -2098,7 +2105,7 @@ int hv_remove_ring_sysfs(struct vmbus_channel *channel)
channel->ring_sysfs_visible = false;
ret = sysfs_update_group(kobj, &vmbus_chan_group);
- channel->mmap_ring_buffer = NULL;
+ channel->mmap_prepare_ring_buffer = NULL;
return ret;
}
EXPORT_SYMBOL_GPL(hv_remove_ring_sysfs);
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 3f8e2e27697f..29ec2d15ada8 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -154,15 +154,16 @@ static void hv_uio_rescind(struct vmbus_channel *channel)
* The ring buffer is allocated as contiguous memory by vmbus_open
*/
static int
-hv_uio_ring_mmap(struct vmbus_channel *channel, struct vm_area_struct *vma)
+hv_uio_ring_mmap_prepare(struct vmbus_channel *channel, struct vm_area_desc *desc)
{
void *ring_buffer = page_address(channel->ringbuffer_page);
if (channel->state != CHANNEL_OPENED_STATE)
return -ENODEV;
- return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
- channel->ringbuffer_pagecount << PAGE_SHIFT);
+ mmap_action_simple_ioremap(desc, virt_to_phys(ring_buffer),
+ channel->ringbuffer_pagecount << PAGE_SHIFT);
+ return 0;
}
/* Callback from VMBUS subsystem when new channel created. */
@@ -183,7 +184,7 @@ hv_uio_new_channel(struct vmbus_channel *new_sc)
}
set_channel_read_mode(new_sc, HV_CALL_ISR);
- ret = hv_create_ring_sysfs(new_sc, hv_uio_ring_mmap);
+ ret = hv_create_ring_sysfs(new_sc, hv_uio_ring_mmap_prepare);
if (ret) {
dev_err(device, "sysfs create ring bin file failed; %d\n", ret);
vmbus_close(new_sc);
@@ -366,7 +367,7 @@ hv_uio_probe(struct hv_device *dev,
* or decoupled from uio_hv_generic probe. Userspace programs can make use of inotify
* APIs to make sure that ring is created.
*/
- hv_create_ring_sysfs(channel, hv_uio_ring_mmap);
+ hv_create_ring_sysfs(channel, hv_uio_ring_mmap_prepare);
hv_set_drvdata(dev, pdata);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index dfc516c1c719..3a721b1853a4 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1015,8 +1015,8 @@ struct vmbus_channel {
/* The max size of a packet on this channel */
u32 max_pkt_size;
- /* function to mmap ring buffer memory to the channel's sysfs ring attribute */
- int (*mmap_ring_buffer)(struct vmbus_channel *channel, struct vm_area_struct *vma);
+ /* function to mmap_prepare ring buffer memory to the channel's sysfs ring attribute */
+ int (*mmap_prepare_ring_buffer)(struct vmbus_channel *channel, struct vm_area_desc *desc);
/* boolean to control visibility of sysfs for ring buffer */
bool ring_sysfs_visible;
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 14/15] uio: replace deprecated mmap hook with mmap_prepare in uio_info
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (12 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 13/15] drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]() Lorenzo Stoakes (Oracle)
2026-03-12 21:23 ` [PATCH 00/15] mm: expand mmap_prepare functionality and usage Andrew Morton
15 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
The f_op->mmap interface is deprecated, so update uio_info to use its
successor, mmap_prepare.
Therefore, replace the uio_info->mmap hook with a new
uio_info->mmap_perepare hook, and update its one user, target_core_user, to
both specify this new mmap_prepare hook and also to use the new
vm_ops->mapped() hook to continue to maintain a correct udev->kref
refcount.
Then update uio_mmap() to utilise the mmap_prepare compatibility layer to
invoke this callback from the uio mmap invocation.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
drivers/target/target_core_user.c | 26 ++++++++++++++++++--------
drivers/uio/uio.c | 10 ++++++++--
include/linux/uio_driver.h | 4 ++--
3 files changed, 28 insertions(+), 12 deletions(-)
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index af95531ddd35..9d211dad5e53 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -1860,6 +1860,17 @@ static struct page *tcmu_try_get_data_page(struct tcmu_dev *udev, uint32_t dpi)
return NULL;
}
+static int tcmu_vma_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
+ const struct file *file, void **vm_private_data)
+{
+ struct tcmu_dev *udev = *vm_private_data;
+
+ pr_debug("vma_open\n");
+
+ kref_get(&udev->kref);
+ return 0;
+}
+
static void tcmu_vma_open(struct vm_area_struct *vma)
{
struct tcmu_dev *udev = vma->vm_private_data;
@@ -1919,26 +1930,25 @@ static vm_fault_t tcmu_vma_fault(struct vm_fault *vmf)
}
static const struct vm_operations_struct tcmu_vm_ops = {
+ .mapped = tcmu_vma_mapped,
.open = tcmu_vma_open,
.close = tcmu_vma_close,
.fault = tcmu_vma_fault,
};
-static int tcmu_mmap(struct uio_info *info, struct vm_area_struct *vma)
+static int tcmu_mmap_prepare(struct uio_info *info, struct vm_area_desc *desc)
{
struct tcmu_dev *udev = container_of(info, struct tcmu_dev, uio_info);
- vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP);
- vma->vm_ops = &tcmu_vm_ops;
+ vma_desc_set_flags(desc, VMA_DONTEXPAND_BIT, VMA_DONTDUMP_BIT);
+ desc->vm_ops = &tcmu_vm_ops;
- vma->vm_private_data = udev;
+ desc->private_data = udev;
/* Ensure the mmap is exactly the right size */
- if (vma_pages(vma) != udev->mmap_pages)
+ if (vma_desc_pages(desc) != udev->mmap_pages)
return -EINVAL;
- tcmu_vma_open(vma);
-
return 0;
}
@@ -2253,7 +2263,7 @@ static int tcmu_configure_device(struct se_device *dev)
info->irqcontrol = tcmu_irqcontrol;
info->irq = UIO_IRQ_CUSTOM;
- info->mmap = tcmu_mmap;
+ info->mmap_prepare = tcmu_mmap_prepare;
info->open = tcmu_open;
info->release = tcmu_release;
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 5a4998e2caf8..1e4ade78ed84 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -850,8 +850,14 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
goto out;
}
- if (idev->info->mmap) {
- ret = idev->info->mmap(idev->info, vma);
+ if (idev->info->mmap_prepare) {
+ struct vm_area_desc desc;
+
+ compat_set_desc_from_vma(&desc, filep, vma);
+ ret = idev->info->mmap_prepare(idev->info, &desc);
+ if (ret)
+ goto out;
+ ret = __compat_vma_mmap(&desc, vma);
goto out;
}
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 334641e20fb1..53bdc557c423 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -97,7 +97,7 @@ struct uio_device {
* @irq_flags: flags for request_irq()
* @priv: optional private data
* @handler: the device's irq handler
- * @mmap: mmap operation for this uio device
+ * @mmap_prepare: mmap_pepare operation for this uio device
* @open: open operation for this uio device
* @release: release operation for this uio device
* @irqcontrol: disable/enable irqs when 0/1 is written to /dev/uioX
@@ -112,7 +112,7 @@ struct uio_info {
unsigned long irq_flags;
void *priv;
irqreturn_t (*handler)(int irq, struct uio_info *dev_info);
- int (*mmap)(struct uio_info *info, struct vm_area_struct *vma);
+ int (*mmap_prepare)(struct uio_info *info, struct vm_area_desc *desc);
int (*open)(struct uio_info *info, struct inode *inode);
int (*release)(struct uio_info *info, struct inode *inode);
int (*irqcontrol)(struct uio_info *info, s32 irq_on);
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]()
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (13 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 14/15] uio: replace deprecated mmap hook with mmap_prepare in uio_info Lorenzo Stoakes (Oracle)
@ 2026-03-12 20:27 ` Lorenzo Stoakes (Oracle)
2026-03-12 23:15 ` Randy Dunlap
2026-03-12 21:23 ` [PATCH 00/15] mm: expand mmap_prepare functionality and usage Andrew Morton
15 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
A user can invoke mmap_action_map_kernel_pages() to specify that the
mapping should map kernel pages starting from desc->start of a specified
number of pages specified in an array.
In order to implement this, adjust mmap_action_prepare() to be able to
return an error code, as it makes sense to assert that the specified
parameters are valid as quickly as possible as well as updating the VMA
flags to include VMA_MIXEDMAP_BIT as necessary.
This provides an mmap_prepare equivalent of vm_insert_pages().
We additionally update the existing vm_insert_pages() code to use
range_in_vma() and add a new range_in_vma_desc() helper function for the
mmap_prepare case, sharing the code between the two in range_is_subset().
We add both mmap_action_map_kernel_pages() and
mmap_action_map_kernel_pages_full() to allow for both partial and full VMA
mappings.
We also add mmap_action_map_kernel_pages_discontig() to allow for
discontiguous mapping of kernel pages should the need arise.
We update the documentation to reflect the new features.
Finally, we update the VMA tests accordingly to reflect the changes.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
Documentation/filesystems/mmap_prepare.rst | 8 ++
include/linux/mm.h | 94 +++++++++++++++++++++-
include/linux/mm_types.h | 7 ++
mm/memory.c | 42 +++++++++-
mm/util.c | 6 ++
tools/testing/vma/include/dup.h | 7 ++
6 files changed, 159 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
index d21406848bca..f89718285869 100644
--- a/Documentation/filesystems/mmap_prepare.rst
+++ b/Documentation/filesystems/mmap_prepare.rst
@@ -129,5 +129,13 @@ pointer. These are:
* `mmap_action_simple_ioremap()` - Sets up an I/O remap from a specified
physical address and over a specified length.
+* `mmap_action_map_kernel_pages()` - Maps a specified array of `struct page`
+ pointers in the VMA from a specific offset.
+
+* `mmap_action_map_kernel_pages_full()` - Maps a specified array of `struct
+ page` pointers over the entire VMA. The caller must ensure there are
+ sufficient entries in the page array to cover the entire range of the
+ described VMA.
+
**NOTE:** The 'action' field should never normally be manipulated directly,
rather you ought to use one of these helpers.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 88f42faeb377..88ad5649c02d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4160,6 +4160,45 @@ static inline void mmap_action_simple_ioremap(struct vm_area_desc *desc,
action->type = MMAP_SIMPLE_IO_REMAP;
}
+/**
+ * mmap_action_map_kernel_pages - helper for mmap_prepare hook to specify that
+ * @num kernel pages contained in the @pages array should be mapped to userland
+ * starting at virtual address @start.
+ * @desc: The VMA descriptor for the VMA requiring kernel pags to be mapped.
+ * @start: The virtual address from which to map them.
+ * @pages: An array of struct page pointers describing the memory to map.
+ * @nr_pages: The number of entries in the @pages aray.
+ */
+static inline void mmap_action_map_kernel_pages(struct vm_area_desc *desc,
+ unsigned long start, struct page **pages,
+ unsigned long nr_pages)
+{
+ struct mmap_action *action = &desc->action;
+
+ action->type = MMAP_MAP_KERNEL_PAGES;
+ action->map_kernel.start = start;
+ action->map_kernel.pages = pages;
+ action->map_kernel.nr_pages = nr_pages;
+ action->map_kernel.pgoff = desc->pgoff;
+}
+
+/**
+ * mmap_action_map_kernel_pages_full - helper for mmap_prepare hook to specify that
+ * kernel pages contained in the @pages array should be mapped to userland
+ * from @desc->start to @desc->end.
+ * @desc: The VMA descriptor for the VMA requiring kernel pags to be mapped.
+ * @pages: An array of struct page pointers describing the memory to map.
+ *
+ * The caller must ensure that @pages contains sufficient entries to cover the
+ * entire range described by @desc.
+ */
+static inline void mmap_action_map_kernel_pages_full(struct vm_area_desc *desc,
+ struct page **pages)
+{
+ mmap_action_map_kernel_pages(desc, desc->start, pages,
+ vma_desc_pages(desc));
+}
+
int mmap_action_prepare(struct vm_area_desc *desc,
struct mmap_action *action);
int mmap_action_complete(struct vm_area_struct *vma,
@@ -4177,10 +4216,59 @@ static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
return vma;
}
+/**
+ * range_is_subset - Is the specified inner range a subset of the outer range?
+ * @outer_start: The start of the outer range.
+ * @outer_end: The exclusive end of the outer range.
+ * @inner_start: The start of the inner range.
+ * @inner_end: The exclusive end of the inner range.
+ *
+ * Returns %true if [inner_start, inner_end) is a subset of [outer_start,
+ * outer_end), otherwise %false.
+ */
+static inline bool range_is_subset(unsigned long outer_start,
+ unsigned long outer_end,
+ unsigned long inner_start,
+ unsigned long inner_end)
+{
+ return outer_start <= inner_start && inner_end <= outer_end;
+}
+
+/**
+ * range_in_vma - is the specified [@start, @end) range a subset of the VMA?
+ * @vma: The VMA against which we want to check [@start, @end).
+ * @start: The start of the range we wish to check.
+ * @end: The exclusive end of the range we wish to check.
+ *
+ * Returns %true if [@start, @end) is a subset of [@vma->vm_start,
+ * @vma->vm_end), %false otherwise.
+ */
static inline bool range_in_vma(const struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- return (vma && vma->vm_start <= start && end <= vma->vm_end);
+ if (!vma)
+ return false;
+
+ return range_is_subset(vma->vm_start, vma->vm_end, start, end);
+}
+
+/**
+ * range_in_vma_desc - is the specified [@start, @end) range a subset of the VMA
+ * described by @desc, a VMA descriptor?
+ * @desc: The VMA descriptor against which we want to check [@start, @end).
+ * @start: The start of the range we wish to check.
+ * @end: The exclusive end of the range we wish to check.
+ *
+ * Returns %true if [@start, @end) is a subset of [@desc->start, @desc->end),
+ * %false otherwise.
+ */
+static inline bool range_in_vma_desc(const struct vm_area_desc *desc,
+ unsigned long start, unsigned long end)
+{
+ if (!desc)
+ return false;
+
+ return range_is_subset(desc->start, desc->end, start, end);
}
#ifdef CONFIG_MMU
@@ -4212,6 +4300,10 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
struct page **pages, unsigned long *num);
+int map_kernel_pages_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action);
+int map_kernel_pages_complete(struct vm_area_struct *vma,
+ struct mmap_action *action);
int vm_map_pages(struct vm_area_struct *vma, struct page **pages,
unsigned long num);
int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 316bb0adf91d..6e7a399f0724 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -815,6 +815,7 @@ enum mmap_action_type {
MMAP_REMAP_PFN, /* Remap PFN range. */
MMAP_IO_REMAP_PFN, /* I/O remap PFN range. */
MMAP_SIMPLE_IO_REMAP, /* I/O remap with guardrails. */
+ MMAP_MAP_KERNEL_PAGES, /* Map kernel page range from array. */
};
/*
@@ -833,6 +834,12 @@ struct mmap_action {
phys_addr_t start_phys_addr;
unsigned long size;
} simple_ioremap;
+ struct {
+ unsigned long start;
+ struct page **pages;
+ unsigned long nr_pages;
+ pgoff_t pgoff;
+ } map_kernel;
};
enum mmap_action_type type;
diff --git a/mm/memory.c b/mm/memory.c
index 351cc917b7aa..608a98c4c947 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2484,13 +2484,14 @@ static int insert_pages(struct vm_area_struct *vma, unsigned long addr,
int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
struct page **pages, unsigned long *num)
{
- const unsigned long end_addr = addr + (*num * PAGE_SIZE) - 1;
+ const unsigned long nr_pages = *num;
+ const unsigned long end = addr + PAGE_SIZE * nr_pages;
- if (addr < vma->vm_start || end_addr >= vma->vm_end)
+ if (!range_in_vma(vma, addr, end))
return -EFAULT;
if (!(vma->vm_flags & VM_MIXEDMAP)) {
- BUG_ON(mmap_read_trylock(vma->vm_mm));
- BUG_ON(vma->vm_flags & VM_PFNMAP);
+ VM_WARN_ON_ONCE(mmap_read_trylock(vma->vm_mm));
+ VM_WARN_ON_ONCE(vma->vm_flags & VM_PFNMAP);
vm_flags_set(vma, VM_MIXEDMAP);
}
/* Defer page refcount checking till we're about to map that page. */
@@ -2498,6 +2499,39 @@ int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
}
EXPORT_SYMBOL(vm_insert_pages);
+int map_kernel_pages_prepare(struct vm_area_desc *desc,
+ struct mmap_action *action)
+{
+ const unsigned long addr = action->map_kernel.start;
+ unsigned long nr_pages, end;
+
+ if (!vma_desc_test(desc, VMA_MIXEDMAP_BIT)) {
+ VM_WARN_ON_ONCE(mmap_read_trylock(desc->mm));
+ VM_WARN_ON_ONCE(vma_desc_test(desc, VMA_PFNMAP_BIT));
+ vma_desc_set_flags(desc, VMA_MIXEDMAP_BIT);
+ }
+
+ nr_pages = action->map_kernel.nr_pages;
+ end = addr + PAGE_SIZE * nr_pages;
+ if (!range_in_vma_desc(desc, addr, end))
+ return -EFAULT;
+
+ return 0;
+}
+EXPORT_SYMBOL(map_kernel_pages_prepare);
+
+int map_kernel_pages_complete(struct vm_area_struct *vma,
+ struct mmap_action *action)
+{
+ unsigned long nr_pages;
+
+ nr_pages = action->map_kernel.nr_pages;
+ return insert_pages(vma, action->map_kernel.start,
+ action->map_kernel.pages,
+ &nr_pages, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(map_kernel_pages_complete);
+
/**
* vm_insert_page - insert single page into user vma
* @vma: user vma to map to
diff --git a/mm/util.c b/mm/util.c
index e739d7c0311c..7934e303b230 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1445,6 +1445,8 @@ int mmap_action_prepare(struct vm_area_desc *desc,
return io_remap_pfn_range_prepare(desc, action);
case MMAP_SIMPLE_IO_REMAP:
return simple_ioremap_prepare(desc, action);
+ case MMAP_MAP_KERNEL_PAGES:
+ return map_kernel_pages_prepare(desc, action);
}
}
EXPORT_SYMBOL(mmap_action_prepare);
@@ -1473,6 +1475,9 @@ int mmap_action_complete(struct vm_area_struct *vma,
case MMAP_IO_REMAP_PFN:
err = io_remap_pfn_range_complete(vma, action);
break;
+ case MMAP_MAP_KERNEL_PAGES:
+ err = map_kernel_pages_complete(vma, action);
+ break;
case MMAP_SIMPLE_IO_REMAP:
/*
* The simple I/O remap should have been delegated to an I/O
@@ -1496,6 +1501,7 @@ int mmap_action_prepare(struct vm_area_desc *desc,
case MMAP_REMAP_PFN:
case MMAP_IO_REMAP_PFN:
case MMAP_SIMPLE_IO_REMAP:
+ case MMAP_MAP_KERNEL_PAGES:
WARN_ON_ONCE(1); /* nommu cannot handle these. */
break;
}
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index 4f2c9bb6b1ea..50ef2f62150d 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -425,6 +425,7 @@ enum mmap_action_type {
MMAP_REMAP_PFN, /* Remap PFN range. */
MMAP_IO_REMAP_PFN, /* I/O remap PFN range. */
MMAP_SIMPLE_IO_REMAP, /* I/O remap with guardrails. */
+ MMAP_MAP_KERNEL_PAGES, /* Map kernel page range from an array. */
};
/*
@@ -443,6 +444,12 @@ struct mmap_action {
phys_addr_t start;
unsigned long len;
} simple_ioremap;
+ struct {
+ unsigned long start;
+ struct page **pages;
+ unsigned long num;
+ pgoff_t pgoff;
+ } map_kernel;
};
enum mmap_action_type type;
--
2.53.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
2026-03-12 20:27 ` [PATCH 01/15] mm: various small mmap_prepare cleanups Lorenzo Stoakes (Oracle)
@ 2026-03-12 21:14 ` Andrew Morton
2026-03-13 12:13 ` Lorenzo Stoakes (Oracle)
2026-03-15 22:56 ` Suren Baghdasaryan
1 sibling, 1 reply; 45+ messages in thread
From: Andrew Morton @ 2026-03-12 21:14 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, 12 Mar 2026 20:27:16 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> +int mmap_action_prepare(struct vm_area_desc *desc,
> + struct mmap_action *action)
> +
> {
> switch (action->type) {
> case MMAP_NOTHING:
> - break;
> + return 0;
> case MMAP_REMAP_PFN:
> - remap_pfn_range_prepare(desc, action->remap.start_pfn);
> - break;
> + return remap_pfn_range_prepare(desc, action);
> case MMAP_IO_REMAP_PFN:
> - io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
> - action->remap.size);
> - break;
> + return io_remap_pfn_range_prepare(desc, action);
> }
> }
> EXPORT_SYMBOL(mmap_action_prepare);
hm, was this the correct version?
mm/util.c: In function 'mmap_action_prepare':
mm/util.c:1451:1: error: control reaches end of non-void function [-Werror=return-type]
1451 | }
--- a/mm/util.c~mm-various-small-mmap_prepare-cleanups-fix
+++ a/mm/util.c
@@ -1356,6 +1356,8 @@ int mmap_action_prepare(struct vm_area_d
return remap_pfn_range_prepare(desc, action);
case MMAP_IO_REMAP_PFN:
return io_remap_pfn_range_prepare(desc, action);
+ default:
+ BUG();
}
}
EXPORT_SYMBOL(mmap_action_prepare);
_
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 00/15] mm: expand mmap_prepare functionality and usage
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
` (14 preceding siblings ...)
2026-03-12 20:27 ` [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]() Lorenzo Stoakes (Oracle)
@ 2026-03-12 21:23 ` Andrew Morton
15 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2026-03-12 21:23 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, 12 Mar 2026 20:27:15 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> This series expands the mmap_prepare functionality, which is intended to
> replace the deprecated f_op->mmap hook which has been the source of bugs
> and security issues for some time.
Thanks, I've added this to mm.git's mm-new branch.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]()
2026-03-12 20:27 ` [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]() Lorenzo Stoakes (Oracle)
@ 2026-03-12 23:15 ` Randy Dunlap
2026-03-16 14:54 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 45+ messages in thread
From: Randy Dunlap @ 2026-03-12 23:15 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle), Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On 3/12/26 1:27 PM, Lorenzo Stoakes (Oracle) wrote:
> Finally, we update the VMA tests accordingly to reflect the changes.
IMO we could omit the word "we" 5 times above.
(but no change is required)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 88f42faeb377..88ad5649c02d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> +/**
> + * range_is_subset - Is the specified inner range a subset of the outer range?
> + * @outer_start: The start of the outer range.
> + * @outer_end: The exclusive end of the outer range.
> + * @inner_start: The start of the inner range.
> + * @inner_end: The exclusive end of the inner range.
> + *
> + * Returns %true if [inner_start, inner_end) is a subset of [outer_start,
* Returns:
(for kernel-doc)
> + * outer_end), otherwise %false.
> + */
> +static inline bool range_is_subset(unsigned long outer_start,
> + unsigned long outer_end,
> + unsigned long inner_start,
> + unsigned long inner_end)
> +{
> + return outer_start <= inner_start && inner_end <= outer_end;
> +}
> +
> +/**
> + * range_in_vma - is the specified [@start, @end) range a subset of the VMA?
> + * @vma: The VMA against which we want to check [@start, @end).
> + * @start: The start of the range we wish to check.
> + * @end: The exclusive end of the range we wish to check.
> + *
> + * Returns %true if [@start, @end) is a subset of [@vma->vm_start,
* Returns:
> + * @vma->vm_end), %false otherwise.
> + */
> static inline bool range_in_vma(const struct vm_area_struct *vma,
> unsigned long start, unsigned long end)
> {
> - return (vma && vma->vm_start <= start && end <= vma->vm_end);
> + if (!vma)
> + return false;
> +
> + return range_is_subset(vma->vm_start, vma->vm_end, start, end);
> +}
> +
> +/**
> + * range_in_vma_desc - is the specified [@start, @end) range a subset of the VMA
> + * described by @desc, a VMA descriptor?
> + * @desc: The VMA descriptor against which we want to check [@start, @end).
> + * @start: The start of the range we wish to check.
> + * @end: The exclusive end of the range we wish to check.
> + *
> + * Returns %true if [@start, @end) is a subset of [@desc->start, @desc->end),
* Returns:
> + * %false otherwise.
> + */
> +static inline bool range_in_vma_desc(const struct vm_area_desc *desc,
> + unsigned long start, unsigned long end)
> +{
> + if (!desc)
> + return false;
> +
> + return range_is_subset(desc->start, desc->end, start, end);
> }
--
~Randy
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback
2026-03-12 20:27 ` [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback Lorenzo Stoakes (Oracle)
@ 2026-03-13 0:12 ` Randy Dunlap
2026-03-16 14:51 ` Lorenzo Stoakes (Oracle)
2026-03-15 23:23 ` Suren Baghdasaryan
1 sibling, 1 reply; 45+ messages in thread
From: Randy Dunlap @ 2026-03-13 0:12 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle), Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
[-- Attachment #1: Type: text/plain, Size: 6544 bytes --]
(Andrew: patch attached)
On 3/12/26 1:27 PM, Lorenzo Stoakes (Oracle) wrote:
Documentation/filesystems/mmap_prepare.rst: WARNING: document isn't included in any toctree [toc.not_included]
Should be in some index.rst file. In filesystems I suppose.
> ---
> Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
> 1 file changed, 131 insertions(+)
> create mode 100644 Documentation/filesystems/mmap_prepare.rst
>
> diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
> new file mode 100644
> index 000000000000..76908200f3a1
> --- /dev/null
> +++ b/Documentation/filesystems/mmap_prepare.rst
> @@ -0,0 +1,131 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========================
> +mmap_prepare callback HOWTO
> +===========================
> +
> +Introduction
> +############
Kernel style is "=============" above instead of "############".
> +
> +The `struct file->f_op->mmap()` callback has been deprecated as it is both a
> +stability and security risk, and doesn't always permit the merging of adjacent
> +mappings resulting in unnecessary memory fragmentation.
> +
> +It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
> +these problems.
> +
> +## How To Use
> +
> +In your driver's `struct file_operations` struct, specify an `mmap_prepare`
> +callback rather than an `mmap` one, e.g. for ext4:
> +
> +
> +.. code-block:: C
> +
> + const struct file_operations ext4_file_operations = {
> + ...
> + .mmap_prepare = ext4_file_mmap_prepare,
> + };
> +
> +This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
> +
> +Examining the `struct vm_area_desc` type:
> +
> +.. code-block:: C
> +
> + struct vm_area_desc {
> + /* Immutable state. */
> + const struct mm_struct *const mm;
> + struct file *const file; /* May vary from vm_file in stacked callers. */
> + unsigned long start;
> + unsigned long end;
> +
> + /* Mutable fields. Populated with initial state. */
> + pgoff_t pgoff;
> + struct file *vm_file;
> + vma_flags_t vma_flags;
> + pgprot_t page_prot;
> +
> + /* Write-only fields. */
> + const struct vm_operations_struct *vm_ops;
> + void *private_data;
> +
> + /* Take further action? */
> + struct mmap_action action;
> + };
> +
> +This is straightforward - you have all the fields you need to set up the
> +mapping, and you can update the mutable and writable fields, for instance:
> +
> +.. code-block:: Cw
.. code-block:: C
Documentation/filesystems/mmap_prepare.rst:60: WARNING: Pygments lexer name 'Cw' is not known [misc.highlighting_failure]
Maybe a typo?
> +
> + static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
> + {
> + int ret;
> + struct file *file = desc->file;
> + struct inode *inode = file->f_mapping->host;
> +
> + ...
> +
> + file_accessed(file);
> + if (IS_DAX(file_inode(file))) {
> + desc->vm_ops = &ext4_dax_vm_ops;
> + vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
> + } else {
> + desc->vm_ops = &ext4_file_vm_ops;
> + }
> + return 0;
> + }
> +
> +Importantly, you no longer have to dance around with reference counts or locks
> +when updating these fields - __you can simply go ahead and change them__.
> +
> +Everything is taken care of by the mapping code.
> +
> +VMA Flags
> +=========
and then use "---------------" here instead of "==============".
(from Documentation/doc-guide/sphinx.rst)
> +
> +Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
> +you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
> +`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
> +locking done correctly for you, this is no longer necessary.
> +
> +Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
> +etc. - i.e. using a `VM_xxx` macro has changed too.
> +
> +When implementing `mmap_prepare()`, reference flags by their bit number, defined
> +as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
> +of (where `desc` is a pointer to `struct vma_area_desc`):
> +
> +* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
> + wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
> + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
> + otherwise `false`.
> +* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
> + additional flags specified by a comma-separated list,
> + e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
> +* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
> + flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
> + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
> +
> +Actions
> +=======
> +
> +You can now very easily have actions be performed upon a mapping once set up by
> +utilising simple helper functions invoked upon the `struct vm_area_desc`
> +pointer. These are:
> +
> +* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
> + range starting a virtual address and PFN number of a set size.
> +
> +* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
> + entire mapping from `start_pfn` onward.
> +
> +* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
> + remap.
> +
> +* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
> + the entire mapping from `start_pfn` onward.
> +
> +**NOTE:** The 'action' field should never normally be manipulated directly,
> +rather you ought to use one of these helpers.
I also see this warning, but I don't know what it is referring to:
Documentation/filesystems/mmap_prepare.rst:132: ERROR: Anonymous hyperlink mismatch: 1 references but 0 targets.
See "backrefs" attribute for IDs. [docutils]
(OK, I found/fixed that also.)
There are also lots of single ` marks which mean italics. I thought those were
not what was intended, so I changed (most of) them to `` marks, which means
"code block / monospace". I can fix those if needed.
from the patch file:
@Lorenzo: ISTR that you prefer explicit quoting on structs and
functions. I didn't do that here since kernel automarkup does that,
but if you prefer, I can redo the patch with those changes.
HTH.
--
~Randy
[-- Attachment #2: mmap-prepare-docs-fixes.patch --]
[-- Type: text/x-patch, Size: 7252 bytes --]
From: Randy Dunlap <rdunlap@infradead.org>
Subject: [PATCH] Docs: mmap_prepare: fix sphinx warnings and format
Fix 'make htmldocs' build warnings, headings style, and quoting
style.
Documentation/filesystems/mmap_prepare.rst: WARNING: document isn't included in any toctree [toc.not_included]
Documentation/filesystems/mmap_prepare.rst:60: WARNING: Pygments lexer name 'Cw' is not known [misc.highlighting_failure]
Documentation/filesystems/mmap_prepare.rst:132: ERROR: Anonymous hyperlink mismatch: 1 references but 0 targets.
See "backrefs" attribute for IDs. [docutils]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
@Lorenzo: ISTR that you prefer explicit quoting on structs and
functions. I didn't do that here since kernel automarkup does that,
but if you prefer, I can redo the patch with those changes.
Documentation/filesystems/index.rst | 1
Documentation/filesystems/mmap_prepare.rst | 74 +++++++++----------
2 files changed, 38 insertions(+), 37 deletions(-)
--- linux-next.orig/Documentation/filesystems/index.rst
+++ linux-next/Documentation/filesystems/index.rst
@@ -29,6 +29,7 @@ algorithms work.
fiemap
files
locks
+ mmap_prepare
multigrain-ts
mount_api
quota
--- linux-next.orig/Documentation/filesystems/mmap_prepare.rst
+++ linux-next/Documentation/filesystems/mmap_prepare.rst
@@ -5,19 +5,19 @@ mmap_prepare callback HOWTO
===========================
Introduction
-############
+============
-The `struct file->f_op->mmap()` callback has been deprecated as it is both a
+The ``struct file->f_op->mmap()`` callback has been deprecated as it is both a
stability and security risk, and doesn't always permit the merging of adjacent
mappings resulting in unnecessary memory fragmentation.
-It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
-these problems.
+It has been replaced with the ``file->f_op->mmap_prepare()`` callback which
+solves these problems.
## How To Use
-In your driver's `struct file_operations` struct, specify an `mmap_prepare`
-callback rather than an `mmap` one, e.g. for ext4:
+In your driver's struct file_operations struct, specify an ``mmap_prepare``
+callback rather than an ``mmap`` one, e.g. for ext4:
.. code-block:: C
@@ -27,9 +27,9 @@ callback rather than an `mmap` one, e.g.
.mmap_prepare = ext4_file_mmap_prepare,
};
-This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
+This has a signature of ``int (*mmap_prepare)(struct vm_area_desc *)``.
-Examining the `struct vm_area_desc` type:
+Examining the struct vm_area_desc type:
.. code-block:: C
@@ -57,7 +57,7 @@ Examining the `struct vm_area_desc` type
This is straightforward - you have all the fields you need to set up the
mapping, and you can update the mutable and writable fields, for instance:
-.. code-block:: Cw
+.. code-block:: C
static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
{
@@ -78,54 +78,54 @@ mapping, and you can update the mutable
}
Importantly, you no longer have to dance around with reference counts or locks
-when updating these fields - __you can simply go ahead and change them__.
+when updating these fields - **you can simply go ahead and change them**.
Everything is taken care of by the mapping code.
VMA Flags
-=========
+---------
-Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
-you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
-`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
+Along with ``mmap_prepare``, VMA flags have undergone an overhaul. Where before
+you would invoke one of vm_flags_init(), vm_flags_reset(), vm_flags_set(),
+vm_flags_clear(), and vm_flags_mod() to modify flags (and to have the
locking done correctly for you, this is no longer necessary.
-Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
-etc. - i.e. using a `VM_xxx` macro has changed too.
+Also, the legacy approach of specifying VMA flags via ``VM_READ``, ``VM_WRITE``,
+etc. - i.e. using a ``-VM_xxx``- macro has changed too.
-When implementing `mmap_prepare()`, reference flags by their bit number, defined
-as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
-of (where `desc` is a pointer to `struct vma_area_desc`):
-
-* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
- wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
- VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
- otherwise `false`.
-* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
+When implementing mmap_prepare(), reference flags by their bit number, defined
+as a ``VMA_xxx_BIT`` macro, e.g. ``VMA_READ_BIT``, ``VMA_WRITE_BIT`` etc.,
+and use one of (where ``desc`` is a pointer to struct vma_area_desc):
+
+* ``vma_desc_test_flags(desc, ...)`` - Specify a comma-separated list of flags
+ you wish to test for (whether _any_ are set), e.g. - ``vma_desc_test_flags(
+ desc, VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`` - returns ``true`` if either are set,
+ otherwise ``false``.
+* ``vma_desc_set_flags(desc, ...)`` - Update the VMA descriptor flags to set
additional flags specified by a comma-separated list,
- e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
-* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
- flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
- VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
+ e.g. - ``vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)``.
+* ``vma_desc_clear_flags(desc, ...)`` - Update the VMA descriptor flags to clear
+ flags specified by a comma-separated list, e.g. - ``vma_desc_clear_flags(
+ desc, VMA_WRITE_BIT, VMA_MAYWRITE_BIT)``.
Actions
=======
You can now very easily have actions be performed upon a mapping once set up by
-utilising simple helper functions invoked upon the `struct vm_area_desc`
+utilising simple helper functions invoked upon the struct vm_area_desc
pointer. These are:
-* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
+* mmap_action_remap() - Remaps a range consisting only of PFNs for a specific
range starting a virtual address and PFN number of a set size.
-* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
- entire mapping from `start_pfn` onward.
+* mmap_action_remap_full() - Same as mmap_action_remap(), only remaps the
+ entire mapping from ``start_pfn`` onward.
-* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
+* mmap_action_ioremap() - Same as mmap_action_remap(), only performs an I/O
remap.
-* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
- the entire mapping from `start_pfn` onward.
+* mmap_action_ioremap_full() - Same as mmap_action_ioremap(), only remaps
+ the entire mapping from ``start_pfn`` onward.
-**NOTE:** The 'action' field should never normally be manipulated directly,
+**NOTE:** The ``action`` field should never normally be manipulated directly,
rather you ought to use one of these helpers.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 04/15] mm: add vm_ops->mapped hook
2026-03-12 20:27 ` [PATCH 04/15] mm: add vm_ops->mapped hook Lorenzo Stoakes (Oracle)
@ 2026-03-13 11:02 ` Usama Arif
2026-03-13 11:58 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 45+ messages in thread
From: Usama Arif @ 2026-03-13 11:02 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, 12 Mar 2026 20:27:19 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> Previously, when a driver needed to do something like establish a reference
> count, it could do so in the mmap hook in the knowledge that the mapping
> would succeed.
>
> With the introduction of f_op->mmap_prepare this is no longer the case, as
> it is invoked prior to actually establishing the mapping.
>
> To take this into account, introduce a new vm_ops->mapped callback which is
> invoked when the VMA is first mapped (though notably - not when it is
> merged - which is correct and mirrors existing mmap/open/close behaviour).
>
> We do better that vm_ops->open() here, as this callback can return an
> error, at which point the VMA will be unmapped.
>
> Note that vm_ops->mapped() is invoked after any mmap action is
> complete (such as I/O remapping).
>
> We intentionally do not expose the VMA at this point, exposing only the
> fields that could be used, and an output parameter in case the operation
> needs to update the vma->vm_private_data field.
>
> In order to deal with stacked filesystems which invoke inner filesystem's
> mmap() invocations, add __compat_vma_mapped() and invoke it on
> vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
> handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
> callback.
>
> We can now also remove call_action_complete() and invoke
> mmap_action_complete() directly, as we separate out the rmap lock logic to
> be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
>
> We also abstract unmapping of a VMA on mmap action completion into its own
> helper function, unmap_vma_locked().
>
> Additionally, update VMA userland test headers to reflect the change.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> include/linux/fs.h | 9 +++-
> include/linux/mm.h | 17 +++++++
> mm/internal.h | 10 ++++
> mm/util.c | 86 ++++++++++++++++++++++++---------
> mm/vma.c | 41 +++++++++++-----
> tools/testing/vma/include/dup.h | 34 ++++++++++++-
> 6 files changed, 158 insertions(+), 39 deletions(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a2628a12bd2b..c390f5c667e3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
> }
>
> int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> +int __vma_check_mmap_hook(struct vm_area_struct *vma);
>
> static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> {
> + int err;
> +
> if (file->f_op->mmap_prepare)
> return compat_vma_mmap(file, vma);
>
> - return file->f_op->mmap(file, vma);
> + err = file->f_op->mmap(file, vma);
> + if (err)
> + return err;
> +
> + return __vma_check_mmap_hook(vma);
> }
>
> static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 12a0b4c63736..7333d5db1221 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -759,6 +759,23 @@ struct vm_operations_struct {
> * Context: User context. May sleep. Caller holds mmap_lock.
> */
> void (*close)(struct vm_area_struct *vma);
> + /**
> + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> + * the new VMA is merged with an adjacent VMA.
> + *
> + * The @vm_private_data field is an output field allowing the user to
> + * modify vma->vm_private_data as necessary.
> + *
> + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> + * set from f_op->mmap.
> + *
> + * Returns %0 on success, or an error otherwise. On error, the VMA will
> + * be unmapped.
> + *
> + * Context: User context. May sleep. Caller holds mmap_lock.
> + */
> + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> + const struct file *file, void **vm_private_data);
> /* Called any time before splitting to check if it's allowed */
> int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> int (*mremap)(struct vm_area_struct *vma);
> diff --git a/mm/internal.h b/mm/internal.h
> index 7bfa85b5e78b..f0f2cf1caa36 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
> * mmap hook and safely handle error conditions. On error, VMA hooks will be
> * mutated.
> *
> + * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
> + *
> * @file: File which backs the mapping.
> * @vma: VMA which we are mapping.
> *
> @@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
> /* unmap_vmas is in mm/memory.c */
> void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
>
> +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> +{
> + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> +
> + mmap_assert_locked(vma->vm_mm);
> + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> +}
> +
> #ifdef CONFIG_MMU
>
> static inline void get_anon_vma(struct anon_vma *anon_vma)
> diff --git a/mm/util.c b/mm/util.c
> index dba1191725b6..2b0ed54008d6 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
> EXPORT_SYMBOL(flush_dcache_folio);
> #endif
>
> +static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct vm_area_desc desc = {
> + .mm = vma->vm_mm,
> + .file = file,
> + .start = vma->vm_start,
> + .end = vma->vm_end,
> +
> + .pgoff = vma->vm_pgoff,
> + .vm_file = vma->vm_file,
> + .vma_flags = vma->flags,
> + .page_prot = vma->vm_page_prot,
> +
> + .action.type = MMAP_NOTHING, /* Default */
> + };
> + int err;
> +
> + err = vfs_mmap_prepare(file, &desc);
> + if (err)
> + return err;
> +
> + err = mmap_action_prepare(&desc, &desc.action);
> + if (err)
> + return err;
> +
> + set_vma_from_desc(vma, &desc);
> + return mmap_action_complete(vma, &desc.action);
> +}
> +
> +static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
> +{
> + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> + void *vm_private_data = vma->vm_private_data;
> + int err;
> +
> + if (!vm_ops->mapped)
> + return 0;
> +
Hello!
Can vm_ops be NULL here? __compat_vma_mapped() is called from
compat_vma_mmap(), which is reached when a filesystem provides
mmap_prepare. If the mmap_prepare hook does not set desc->vm_ops,
vma->vm_ops will be NULL and this dereferences a NULL pointer.
For e.g. drivers/char/mem.c, mmap_zero_prepare() would trigger
a NULL pointer dereference here.
Would need to do
if (!vm_ops || !vm_ops->mapped)
return 0;
here
> + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
> + &vm_private_data);
> + if (err)
> + unmap_vma_locked(vma);
when mapped() returns an error, unmap_vma_locked(vma) is called
but execution continues into the vm_private_data update below. After
unmap_vma_locked() the VMA may be freed (do_munmap can remove the VMA
entirely), so accessing vma->vm_private_data after that is a
use-after-free.
Probably need to do:
if (err) {
unmap_vma_locked(vma);
return err;
}
> + /* Update private data if changed. */
> + if (vm_private_data != vma->vm_private_data)
> + vma->vm_private_data = vm_private_data;
> +
> + return err;
> +}
> +
> /**
> * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> * existing VMA and execute any requested actions.
> @@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
> */
> int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> {
> - struct vm_area_desc desc = {
> - .mm = vma->vm_mm,
> - .file = file,
> - .start = vma->vm_start,
> - .end = vma->vm_end,
> -
> - .pgoff = vma->vm_pgoff,
> - .vm_file = vma->vm_file,
> - .vma_flags = vma->flags,
> - .page_prot = vma->vm_page_prot,
> -
> - .action.type = MMAP_NOTHING, /* Default */
> - };
> int err;
>
> - err = vfs_mmap_prepare(file, &desc);
> - if (err)
> - return err;
> -
> - err = mmap_action_prepare(&desc, &desc.action);
> + err = __compat_vma_mmap(file, vma);
> if (err)
> return err;
>
> - set_vma_from_desc(vma, &desc);
> - return mmap_action_complete(vma, &desc.action);
> + return __compat_vma_mapped(file, vma);
> }
> EXPORT_SYMBOL(compat_vma_mmap);
>
> +int __vma_check_mmap_hook(struct vm_area_struct *vma)
> +{
> + /* vm_ops->mapped is not valid if mmap() is specified. */
> + if (WARN_ON_ONCE(vma->vm_ops->mapped))
> + return -EINVAL;
I think vma->vm_ops can be NULL here. Should be:
if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped))
return -EINVAL;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(__vma_check_mmap_hook);
> +
> static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
> const struct page *page)
> {
> @@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
> * invoked if we do NOT merge, so we only clean up the VMA we created.
> */
> if (err) {
> - const size_t len = vma_pages(vma) << PAGE_SHIFT;
> -
> - do_munmap(current->mm, vma->vm_start, len, NULL);
> -
> + unmap_vma_locked(vma);
> if (action->error_hook) {
> /* We may want to filter the error. */
> err = action->error_hook(err);
> diff --git a/mm/vma.c b/mm/vma.c
> index 054cf1d262fb..ef9f5a5365d1 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> return false;
> }
>
> -static int call_action_complete(struct mmap_state *map,
> - struct mmap_action *action,
> - struct vm_area_struct *vma)
> +static int call_mapped_hook(struct vm_area_struct *vma)
> {
> - int ret;
> + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> + void *vm_private_data = vma->vm_private_data;
> + int err;
>
> - ret = mmap_action_complete(vma, action);
> + if (!vm_ops || !vm_ops->mapped)
> + return 0;
> + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
> + vma->vm_file, &vm_private_data);
> + if (err) {
> + unmap_vma_locked(vma);
> + return err;
> + }
> + /* Update private data if changed. */
> + if (vm_private_data != vma->vm_private_data)
> + vma->vm_private_data = vm_private_data;
> + return 0;
> +}
>
> - /* If we held the file rmap we need to release it. */
> - if (map->hold_file_rmap_lock) {
> - struct file *file = vma->vm_file;
> +static void maybe_drop_file_rmap_lock(struct mmap_state *map,
> + struct vm_area_struct *vma)
> +{
> + struct file *file;
>
> - i_mmap_unlock_write(file->f_mapping);
> - }
> - return ret;
> + if (!map->hold_file_rmap_lock)
> + return;
> + file = vma->vm_file;
> + i_mmap_unlock_write(file->f_mapping);
> }
>
> static unsigned long __mmap_region(struct file *file, unsigned long addr,
> @@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> __mmap_complete(&map, vma);
>
> if (have_mmap_prepare && allocated_new) {
> - error = call_action_complete(&map, &desc.action, vma);
> + error = mmap_action_complete(vma, &desc.action);
> + if (!error)
> + error = call_mapped_hook(vma);
>
> + maybe_drop_file_rmap_lock(&map, vma);
> if (error)
> return error;
> }
> diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> index 908beb263307..47d8db809f31 100644
> --- a/tools/testing/vma/include/dup.h
> +++ b/tools/testing/vma/include/dup.h
> @@ -606,12 +606,34 @@ struct vm_area_struct {
> } __randomize_layout;
>
> struct vm_operations_struct {
> - void (*open)(struct vm_area_struct * area);
> + /**
> + * @open: Called when a VMA is remapped or split. Not called upon first
> + * mapping a VMA.
> + * Context: User context. May sleep. Caller holds mmap_lock.
> + */
> + void (*open)(struct vm_area_struct *vma);
> /**
> * @close: Called when the VMA is being removed from the MM.
> * Context: User context. May sleep. Caller holds mmap_lock.
> */
> - void (*close)(struct vm_area_struct * area);
> + void (*close)(struct vm_area_struct *vma);
> + /**
> + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> + * the new VMA is merged with an adjacent VMA.
> + *
> + * The @vm_private_data field is an output field allowing the user to
> + * modify vma->vm_private_data as necessary.
> + *
> + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> + * set from f_op->mmap.
> + *
> + * Returns %0 on success, or an error otherwise. On error, the VMA will
> + * be unmapped.
> + *
> + * Context: User context. May sleep. Caller holds mmap_lock.
> + */
> + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> + const struct file *file, void **vm_private_data);
> /* Called any time before splitting to check if it's allowed */
> int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> int (*mremap)(struct vm_area_struct *area);
> @@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
> swap(vma->vm_file, file);
> fput(file);
> }
> +
> +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> +{
> + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> +
> + mmap_assert_locked(vma->vm_mm);
> + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> +}
> --
> 2.53.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
2026-03-12 20:27 ` [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure Lorenzo Stoakes (Oracle)
@ 2026-03-13 11:07 ` Usama Arif
2026-03-13 12:00 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 45+ messages in thread
From: Usama Arif @ 2026-03-13 11:07 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> the deprecated mmap callback.
>
> However, it did not account for the fact that mmap_prepare can fail to map
> due to an out of memory error, and thus should not be incrementing a
> reference count on mmap_prepare.
>
> With the newly added vm_ops->mapped callback available, we can simply defer
> this operation to that callback which is only invoked once the mapping is
> successfully in place (but not yet visible to userspace as the mmap and VMA
> write locks are held).
>
> Therefore add afs_mapped() to implement this callback for AFS.
>
> In practice the mapping allocations are 'too small to fail' so this is
> something that realistically should never happen in practice (or would do
> so in a case where the process is about to die anyway), but we should still
> handle this.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> fs/afs/file.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index f609366fd2ac..69ef86f5e274 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> static void afs_vm_open(struct vm_area_struct *area);
> static void afs_vm_close(struct vm_area_struct *area);
> static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> + const struct file *file, void **vm_private_data);
>
> const struct file_operations afs_file_operations = {
> .open = afs_open,
> @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> };
>
> static const struct vm_operations_struct afs_vm_ops = {
> + .mapped = afs_mapped,
> .open = afs_vm_open,
> .close = afs_vm_close,
> .fault = filemap_fault,
> @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> afs_add_open_mmap(vnode);
Is the above afs_add_open_mmap an additional one, which could cause a reference
leak? Does the above one need to be removed and only the one in afs_mapped()
needs to be kept?
>
> ret = generic_file_mmap_prepare(desc);
> - if (ret == 0)
> - desc->vm_ops = &afs_vm_ops;
> - else
> - afs_drop_open_mmap(vnode);
> + if (ret)
> + return ret;
> +
> + desc->vm_ops = &afs_vm_ops;
> return ret;
> }
>
> +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> + const struct file *file, void **vm_private_data)
> +{
> + struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> +
> + afs_add_open_mmap(vnode);
> + return 0;
> +}
> +
> static void afs_vm_open(struct vm_area_struct *vma)
> {
> afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> --
> 2.53.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 04/15] mm: add vm_ops->mapped hook
2026-03-13 11:02 ` Usama Arif
@ 2026-03-13 11:58 ` Lorenzo Stoakes (Oracle)
2026-03-16 2:18 ` Suren Baghdasaryan
0 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-13 11:58 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, Clemens Ladisch, Arnd Bergmann, Greg Kroah-Hartman,
K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Alexander Shishkin, Maxime Coquelin, Alexandre Torgue,
Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
Bodo Stroesser, Martin K . Petersen, David Howells, Marc Dionne,
Alexander Viro, Christian Brauner, Jan Kara, David Hildenbrand,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Fri, Mar 13, 2026 at 04:02:36AM -0700, Usama Arif wrote:
> On Thu, 12 Mar 2026 20:27:19 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > Previously, when a driver needed to do something like establish a reference
> > count, it could do so in the mmap hook in the knowledge that the mapping
> > would succeed.
> >
> > With the introduction of f_op->mmap_prepare this is no longer the case, as
> > it is invoked prior to actually establishing the mapping.
> >
> > To take this into account, introduce a new vm_ops->mapped callback which is
> > invoked when the VMA is first mapped (though notably - not when it is
> > merged - which is correct and mirrors existing mmap/open/close behaviour).
> >
> > We do better that vm_ops->open() here, as this callback can return an
> > error, at which point the VMA will be unmapped.
> >
> > Note that vm_ops->mapped() is invoked after any mmap action is
> > complete (such as I/O remapping).
> >
> > We intentionally do not expose the VMA at this point, exposing only the
> > fields that could be used, and an output parameter in case the operation
> > needs to update the vma->vm_private_data field.
> >
> > In order to deal with stacked filesystems which invoke inner filesystem's
> > mmap() invocations, add __compat_vma_mapped() and invoke it on
> > vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
> > handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
> > callback.
> >
> > We can now also remove call_action_complete() and invoke
> > mmap_action_complete() directly, as we separate out the rmap lock logic to
> > be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
> >
> > We also abstract unmapping of a VMA on mmap action completion into its own
> > helper function, unmap_vma_locked().
> >
> > Additionally, update VMA userland test headers to reflect the change.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> > include/linux/fs.h | 9 +++-
> > include/linux/mm.h | 17 +++++++
> > mm/internal.h | 10 ++++
> > mm/util.c | 86 ++++++++++++++++++++++++---------
> > mm/vma.c | 41 +++++++++++-----
> > tools/testing/vma/include/dup.h | 34 ++++++++++++-
> > 6 files changed, 158 insertions(+), 39 deletions(-)
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index a2628a12bd2b..c390f5c667e3 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
> > }
> >
> > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> > +int __vma_check_mmap_hook(struct vm_area_struct *vma);
> >
> > static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> > {
> > + int err;
> > +
> > if (file->f_op->mmap_prepare)
> > return compat_vma_mmap(file, vma);
> >
> > - return file->f_op->mmap(file, vma);
> > + err = file->f_op->mmap(file, vma);
> > + if (err)
> > + return err;
> > +
> > + return __vma_check_mmap_hook(vma);
> > }
> >
> > static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 12a0b4c63736..7333d5db1221 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -759,6 +759,23 @@ struct vm_operations_struct {
> > * Context: User context. May sleep. Caller holds mmap_lock.
> > */
> > void (*close)(struct vm_area_struct *vma);
> > + /**
> > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > + * the new VMA is merged with an adjacent VMA.
> > + *
> > + * The @vm_private_data field is an output field allowing the user to
> > + * modify vma->vm_private_data as necessary.
> > + *
> > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > + * set from f_op->mmap.
> > + *
> > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > + * be unmapped.
> > + *
> > + * Context: User context. May sleep. Caller holds mmap_lock.
> > + */
> > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > + const struct file *file, void **vm_private_data);
> > /* Called any time before splitting to check if it's allowed */
> > int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> > int (*mremap)(struct vm_area_struct *vma);
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 7bfa85b5e78b..f0f2cf1caa36 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
> > * mmap hook and safely handle error conditions. On error, VMA hooks will be
> > * mutated.
> > *
> > + * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
> > + *
> > * @file: File which backs the mapping.
> > * @vma: VMA which we are mapping.
> > *
> > @@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
> > /* unmap_vmas is in mm/memory.c */
> > void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
> >
> > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > +{
> > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > +
> > + mmap_assert_locked(vma->vm_mm);
> > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > +}
> > +
> > #ifdef CONFIG_MMU
> >
> > static inline void get_anon_vma(struct anon_vma *anon_vma)
> > diff --git a/mm/util.c b/mm/util.c
> > index dba1191725b6..2b0ed54008d6 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
> > EXPORT_SYMBOL(flush_dcache_folio);
> > #endif
> >
> > +static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > + struct vm_area_desc desc = {
> > + .mm = vma->vm_mm,
> > + .file = file,
> > + .start = vma->vm_start,
> > + .end = vma->vm_end,
> > +
> > + .pgoff = vma->vm_pgoff,
> > + .vm_file = vma->vm_file,
> > + .vma_flags = vma->flags,
> > + .page_prot = vma->vm_page_prot,
> > +
> > + .action.type = MMAP_NOTHING, /* Default */
> > + };
> > + int err;
> > +
> > + err = vfs_mmap_prepare(file, &desc);
> > + if (err)
> > + return err;
> > +
> > + err = mmap_action_prepare(&desc, &desc.action);
> > + if (err)
> > + return err;
> > +
> > + set_vma_from_desc(vma, &desc);
> > + return mmap_action_complete(vma, &desc.action);
> > +}
> > +
> > +static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
> > +{
> > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > + void *vm_private_data = vma->vm_private_data;
> > + int err;
> > +
> > + if (!vm_ops->mapped)
> > + return 0;
> > +
>
> Hello!
>
> Can vm_ops be NULL here? __compat_vma_mapped() is called from
> compat_vma_mmap(), which is reached when a filesystem provides
> mmap_prepare. If the mmap_prepare hook does not set desc->vm_ops,
> vma->vm_ops will be NULL and this dereferences a NULL pointer.
I _think_ for this to ever be invoked, you would need to be dealing with a
file-backed VMA so vm_ops->fault would HAVE to be defined.
But you're right anyway as a matter of principle we should check it! Will fix.
>
> For e.g. drivers/char/mem.c, mmap_zero_prepare() would trigger
> a NULL pointer dereference here.
>
> Would need to do
> if (!vm_ops || !vm_ops->mapped)
> return 0;
>
> here
Yes.
>
>
> > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
> > + &vm_private_data);
> > + if (err)
> > + unmap_vma_locked(vma);
>
> when mapped() returns an error, unmap_vma_locked(vma) is called
> but execution continues into the vm_private_data update below. After
> unmap_vma_locked() the VMA may be freed (do_munmap can remove the VMA
> entirely), so accessing vma->vm_private_data after that is a
> use-after-free.
Very good point :) will fix thanks!
Probably:
if (err)
unmap_vma_locked(vma);
else if (vm_private_data != vma->vm_private_data)
vma->vm_private_data = vm_private_data;
return err;
Would be fine.
>
> Probably need to do:
> if (err) {
> unmap_vma_locked(vma);
> return err;
> }
>
> > + /* Update private data if changed. */
> > + if (vm_private_data != vma->vm_private_data)
> > + vma->vm_private_data = vm_private_data;
> > +
> > + return err;
> > +}
> > +
> > /**
> > * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> > * existing VMA and execute any requested actions.
> > @@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
> > */
> > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > {
> > - struct vm_area_desc desc = {
> > - .mm = vma->vm_mm,
> > - .file = file,
> > - .start = vma->vm_start,
> > - .end = vma->vm_end,
> > -
> > - .pgoff = vma->vm_pgoff,
> > - .vm_file = vma->vm_file,
> > - .vma_flags = vma->flags,
> > - .page_prot = vma->vm_page_prot,
> > -
> > - .action.type = MMAP_NOTHING, /* Default */
> > - };
> > int err;
> >
> > - err = vfs_mmap_prepare(file, &desc);
> > - if (err)
> > - return err;
> > -
> > - err = mmap_action_prepare(&desc, &desc.action);
> > + err = __compat_vma_mmap(file, vma);
> > if (err)
> > return err;
> >
> > - set_vma_from_desc(vma, &desc);
> > - return mmap_action_complete(vma, &desc.action);
> > + return __compat_vma_mapped(file, vma);
> > }
> > EXPORT_SYMBOL(compat_vma_mmap);
> >
> > +int __vma_check_mmap_hook(struct vm_area_struct *vma)
> > +{
> > + /* vm_ops->mapped is not valid if mmap() is specified. */
> > + if (WARN_ON_ONCE(vma->vm_ops->mapped))
> > + return -EINVAL;
>
> I think vma->vm_ops can be NULL here. Should be:
>
> if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped))
> return -EINVAL;
I think again you'd probably only invoke this on file-backed so be ok, but again
as a matter of principle we should check it so will fix, thanks!
>
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(__vma_check_mmap_hook);
> > +
> > static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
> > const struct page *page)
> > {
> > @@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
> > * invoked if we do NOT merge, so we only clean up the VMA we created.
> > */
> > if (err) {
> > - const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > -
> > - do_munmap(current->mm, vma->vm_start, len, NULL);
> > -
> > + unmap_vma_locked(vma);
> > if (action->error_hook) {
> > /* We may want to filter the error. */
> > err = action->error_hook(err);
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 054cf1d262fb..ef9f5a5365d1 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> > return false;
> > }
> >
> > -static int call_action_complete(struct mmap_state *map,
> > - struct mmap_action *action,
> > - struct vm_area_struct *vma)
> > +static int call_mapped_hook(struct vm_area_struct *vma)
> > {
> > - int ret;
> > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > + void *vm_private_data = vma->vm_private_data;
> > + int err;
> >
> > - ret = mmap_action_complete(vma, action);
> > + if (!vm_ops || !vm_ops->mapped)
> > + return 0;
> > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
> > + vma->vm_file, &vm_private_data);
> > + if (err) {
> > + unmap_vma_locked(vma);
> > + return err;
> > + }
> > + /* Update private data if changed. */
> > + if (vm_private_data != vma->vm_private_data)
> > + vma->vm_private_data = vm_private_data;
> > + return 0;
> > +}
> >
> > - /* If we held the file rmap we need to release it. */
> > - if (map->hold_file_rmap_lock) {
> > - struct file *file = vma->vm_file;
> > +static void maybe_drop_file_rmap_lock(struct mmap_state *map,
> > + struct vm_area_struct *vma)
> > +{
> > + struct file *file;
> >
> > - i_mmap_unlock_write(file->f_mapping);
> > - }
> > - return ret;
> > + if (!map->hold_file_rmap_lock)
> > + return;
> > + file = vma->vm_file;
> > + i_mmap_unlock_write(file->f_mapping);
> > }
> >
> > static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > @@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > __mmap_complete(&map, vma);
> >
> > if (have_mmap_prepare && allocated_new) {
> > - error = call_action_complete(&map, &desc.action, vma);
> > + error = mmap_action_complete(vma, &desc.action);
> > + if (!error)
> > + error = call_mapped_hook(vma);
> >
> > + maybe_drop_file_rmap_lock(&map, vma);
> > if (error)
> > return error;
> > }
> > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > index 908beb263307..47d8db809f31 100644
> > --- a/tools/testing/vma/include/dup.h
> > +++ b/tools/testing/vma/include/dup.h
> > @@ -606,12 +606,34 @@ struct vm_area_struct {
> > } __randomize_layout;
> >
> > struct vm_operations_struct {
> > - void (*open)(struct vm_area_struct * area);
> > + /**
> > + * @open: Called when a VMA is remapped or split. Not called upon first
> > + * mapping a VMA.
> > + * Context: User context. May sleep. Caller holds mmap_lock.
> > + */
> > + void (*open)(struct vm_area_struct *vma);
> > /**
> > * @close: Called when the VMA is being removed from the MM.
> > * Context: User context. May sleep. Caller holds mmap_lock.
> > */
> > - void (*close)(struct vm_area_struct * area);
> > + void (*close)(struct vm_area_struct *vma);
> > + /**
> > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > + * the new VMA is merged with an adjacent VMA.
> > + *
> > + * The @vm_private_data field is an output field allowing the user to
> > + * modify vma->vm_private_data as necessary.
> > + *
> > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > + * set from f_op->mmap.
> > + *
> > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > + * be unmapped.
> > + *
> > + * Context: User context. May sleep. Caller holds mmap_lock.
> > + */
> > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > + const struct file *file, void **vm_private_data);
> > /* Called any time before splitting to check if it's allowed */
> > int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> > int (*mremap)(struct vm_area_struct *area);
> > @@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
> > swap(vma->vm_file, file);
> > fput(file);
> > }
> > +
> > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > +{
> > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > +
> > + mmap_assert_locked(vma->vm_mm);
> > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > +}
> > --
> > 2.53.0
> >
> >
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
2026-03-13 11:07 ` Usama Arif
@ 2026-03-13 12:00 ` Lorenzo Stoakes (Oracle)
2026-03-16 2:32 ` Suren Baghdasaryan
0 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-13 12:00 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, Clemens Ladisch, Arnd Bergmann, Greg Kroah-Hartman,
K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Alexander Shishkin, Maxime Coquelin, Alexandre Torgue,
Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
Bodo Stroesser, Martin K . Petersen, David Howells, Marc Dionne,
Alexander Viro, Christian Brauner, Jan Kara, David Hildenbrand,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > the deprecated mmap callback.
> >
> > However, it did not account for the fact that mmap_prepare can fail to map
> > due to an out of memory error, and thus should not be incrementing a
> > reference count on mmap_prepare.
> >
> > With the newly added vm_ops->mapped callback available, we can simply defer
> > this operation to that callback which is only invoked once the mapping is
> > successfully in place (but not yet visible to userspace as the mmap and VMA
> > write locks are held).
> >
> > Therefore add afs_mapped() to implement this callback for AFS.
> >
> > In practice the mapping allocations are 'too small to fail' so this is
> > something that realistically should never happen in practice (or would do
> > so in a case where the process is about to die anyway), but we should still
> > handle this.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> > fs/afs/file.c | 20 ++++++++++++++++----
> > 1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > index f609366fd2ac..69ef86f5e274 100644
> > --- a/fs/afs/file.c
> > +++ b/fs/afs/file.c
> > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> > static void afs_vm_open(struct vm_area_struct *area);
> > static void afs_vm_close(struct vm_area_struct *area);
> > static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > + const struct file *file, void **vm_private_data);
> >
> > const struct file_operations afs_file_operations = {
> > .open = afs_open,
> > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> > };
> >
> > static const struct vm_operations_struct afs_vm_ops = {
> > + .mapped = afs_mapped,
> > .open = afs_vm_open,
> > .close = afs_vm_close,
> > .fault = filemap_fault,
> > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> > afs_add_open_mmap(vnode);
>
> Is the above afs_add_open_mmap an additional one, which could cause a reference
> leak? Does the above one need to be removed and only the one in afs_mapped()
> needs to be kept?
Ah yeah good spot, will fix thanks!
>
> >
> > ret = generic_file_mmap_prepare(desc);
> > - if (ret == 0)
> > - desc->vm_ops = &afs_vm_ops;
> > - else
> > - afs_drop_open_mmap(vnode);
> > + if (ret)
> > + return ret;
> > +
> > + desc->vm_ops = &afs_vm_ops;
> > return ret;
> > }
> >
> > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > + const struct file *file, void **vm_private_data)
> > +{
> > + struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > +
> > + afs_add_open_mmap(vnode);
> > + return 0;
> > +}
> > +
> > static void afs_vm_open(struct vm_area_struct *vma)
> > {
> > afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > --
> > 2.53.0
> >
> >
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
2026-03-12 21:14 ` Andrew Morton
@ 2026-03-13 12:13 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-13 12:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, Mar 12, 2026 at 02:14:25PM -0700, Andrew Morton wrote:
> On Thu, 12 Mar 2026 20:27:16 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > +int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > +
> > {
> > switch (action->type) {
> > case MMAP_NOTHING:
> > - break;
> > + return 0;
> > case MMAP_REMAP_PFN:
> > - remap_pfn_range_prepare(desc, action->remap.start_pfn);
> > - break;
> > + return remap_pfn_range_prepare(desc, action);
> > case MMAP_IO_REMAP_PFN:
> > - io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
> > - action->remap.size);
> > - break;
> > + return io_remap_pfn_range_prepare(desc, action);
> > }
> > }
> > EXPORT_SYMBOL(mmap_action_prepare);
>
> hm, was this the correct version?
>
> mm/util.c: In function 'mmap_action_prepare':
> mm/util.c:1451:1: error: control reaches end of non-void function [-Werror=return-type]
> 1451 | }
Seems different compiler versions do different things :)
In theory we should never hit that but memory corruption and err... rogue
drivers? could cause it ofc :)
Will fix on respin.
>
> --- a/mm/util.c~mm-various-small-mmap_prepare-cleanups-fix
> +++ a/mm/util.c
> @@ -1356,6 +1356,8 @@ int mmap_action_prepare(struct vm_area_d
> return remap_pfn_range_prepare(desc, action);
> case MMAP_IO_REMAP_PFN:
> return io_remap_pfn_range_prepare(desc, action);
> + default:
> + BUG();
I'd probably prefer a WARN_ON_ONCE(1) return -EBLAH; will think about it on
respin.
> }
> }
> EXPORT_SYMBOL(mmap_action_prepare);
> _
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
2026-03-12 20:27 ` [PATCH 01/15] mm: various small mmap_prepare cleanups Lorenzo Stoakes (Oracle)
2026-03-12 21:14 ` Andrew Morton
@ 2026-03-15 22:56 ` Suren Baghdasaryan
2026-03-15 23:06 ` Suren Baghdasaryan
2026-03-16 14:44 ` Lorenzo Stoakes (Oracle)
1 sibling, 2 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-15 22:56 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> Rather than passing arbitrary fields, pass an mmap_action field directly to
> mmap prepare and complete helpers to put all the action-specific logic in
> the function actually doing the work.
>
> Additionally, allow mmap prepare functions to return an error so we can
> error out as soon as possible if there is something logically incorrect in
> the input.
>
> Update remap_pfn_range_prepare() to properly check the input range for the
> CoW case.
By "properly check" do you mean the replacement of desc->start and
desc->end with action->remap.start and action->remap.start +
action->remap.size when calling get_remap_pgoff() from
remap_pfn_range_prepare()?
>
> While we're here, make remap_pfn_range_prepare_vma() a little neater, and
> pass mmap_action directly to call_action_complete().
>
> Then, update compat_vma_mmap() to perform its logic directly, as
> __compat_vma_map() is not used by anything so we don't need to export it.
Not directly related to this patch but while reviewing, I was also
checking vma locking rules in this mmap_prepare() + mmap() sequence
and I noticed that the new VMA flag modification functions like
vma_set_flags_mask() do assert vma_assert_locked(vma). It would be
useful to add these but as a separate change. I will add it to my todo
list.
>
> Also update compat_vma_mmap() to use vfs_mmap_prepare() rather than calling
> the mmap_prepare op directly.
>
> Finally, update the VMA userland tests to reflect the changes.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> include/linux/fs.h | 2 -
> include/linux/mm.h | 8 +--
> mm/internal.h | 28 +++++---
> mm/memory.c | 45 +++++++-----
> mm/util.c | 112 +++++++++++++-----------------
> mm/vma.c | 21 +++---
> tools/testing/vma/include/dup.h | 9 ++-
> tools/testing/vma/include/stubs.h | 9 +--
> 8 files changed, 123 insertions(+), 111 deletions(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 8b3dd145b25e..a2628a12bd2b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2058,8 +2058,6 @@ static inline bool can_mmap_file(struct file *file)
> return true;
> }
>
> -int __compat_vma_mmap(const struct file_operations *f_op,
> - struct file *file, struct vm_area_struct *vma);
> int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
>
> static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4c4fd55fc823..cc5960a84382 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4116,10 +4116,10 @@ static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
> mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
> }
>
> -void mmap_action_prepare(struct mmap_action *action,
> - struct vm_area_desc *desc);
> -int mmap_action_complete(struct mmap_action *action,
> - struct vm_area_struct *vma);
> +int mmap_action_prepare(struct vm_area_desc *desc,
> + struct mmap_action *action);
> +int mmap_action_complete(struct vm_area_struct *vma,
> + struct mmap_action *action);
>
> /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
> static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
> diff --git a/mm/internal.h b/mm/internal.h
> index 95b583e7e4f7..7bfa85b5e78b 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1775,26 +1775,32 @@ int walk_page_range_debug(struct mm_struct *mm, unsigned long start,
> void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm);
> int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm);
>
> -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
> -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> - unsigned long pfn, unsigned long size, pgprot_t pgprot);
> +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> + struct mmap_action *action);
> +int remap_pfn_range_complete(struct vm_area_struct *vma,
> + struct mmap_action *action);
>
> -static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> - unsigned long orig_pfn, unsigned long size)
> +static inline int io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> + struct mmap_action *action)
> {
> + const unsigned long orig_pfn = action->remap.start_pfn;
> + const unsigned long size = action->remap.size;
> const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
>
> - return remap_pfn_range_prepare(desc, pfn);
> + action->remap.start_pfn = pfn;
> + return remap_pfn_range_prepare(desc, action);
> }
>
> static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
> - unsigned long addr, unsigned long orig_pfn, unsigned long size,
> - pgprot_t orig_prot)
> + struct mmap_action *action)
> {
> - const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> - const pgprot_t prot = pgprot_decrypted(orig_prot);
> + const unsigned long size = action->remap.size;
> + const unsigned long orig_pfn = action->remap.start_pfn;
> + const pgprot_t orig_prot = vma->vm_page_prot;
>
> - return remap_pfn_range_complete(vma, addr, pfn, size, prot);
> + action->remap.pgprot = pgprot_decrypted(orig_prot);
> + action->remap.start_pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> + return remap_pfn_range_complete(vma, action);
> }
>
> #ifdef CONFIG_MMU_NOTIFIER
> diff --git a/mm/memory.c b/mm/memory.c
> index 6aa0ea4af1fc..364fa8a45360 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3099,26 +3099,34 @@ static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> }
> #endif
>
> -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
> +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> + struct mmap_action *action)
> {
> - /*
> - * We set addr=VMA start, end=VMA end here, so this won't fail, but we
> - * check it again on complete and will fail there if specified addr is
> - * invalid.
> - */
> - get_remap_pgoff(vma_desc_is_cow_mapping(desc), desc->start, desc->end,
> - desc->start, desc->end, pfn, &desc->pgoff);
> + const unsigned long start = action->remap.start;
> + const unsigned long end = start + action->remap.size;
> + const unsigned long pfn = action->remap.start_pfn;
> + const bool is_cow = vma_desc_is_cow_mapping(desc);
I was trying to figure out who sets action->remap.start and
action->remap.size and if they somehow guaranteed to be always equal
to desc->start and (desc->end - desc->start). My understanding is that
action->remap.start and action->remap.size are set by
f_op->mmap_prepare() but I'm not sure if they are always the same as
desc->start and (desc->end - desc->start) and if so, how do we enforce
that.
> + int err;
> +
> + err = get_remap_pgoff(is_cow, start, end, desc->start, desc->end, pfn,
> + &desc->pgoff);
> + if (err)
> + return err;
> +
> vma_desc_set_flags_mask(desc, VMA_REMAP_FLAGS);
> + return 0;
> }
>
> -static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma, unsigned long addr,
> - unsigned long pfn, unsigned long size)
> +static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma,
> + unsigned long addr, unsigned long pfn,
> + unsigned long size)
> {
> - unsigned long end = addr + PAGE_ALIGN(size);
> + const unsigned long end = addr + PAGE_ALIGN(size);
> + const bool is_cow = is_cow_mapping(vma->vm_flags);
> int err;
>
> - err = get_remap_pgoff(is_cow_mapping(vma->vm_flags), addr, end,
> - vma->vm_start, vma->vm_end, pfn, &vma->vm_pgoff);
> + err = get_remap_pgoff(is_cow, addr, end, vma->vm_start, vma->vm_end,
> + pfn, &vma->vm_pgoff);
> if (err)
> return err;
>
> @@ -3151,10 +3159,15 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> }
> EXPORT_SYMBOL(remap_pfn_range);
>
> -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> - unsigned long pfn, unsigned long size, pgprot_t prot)
> +int remap_pfn_range_complete(struct vm_area_struct *vma,
> + struct mmap_action *action)
> {
> - return do_remap_pfn_range(vma, addr, pfn, size, prot);
> + const unsigned long start = action->remap.start;
> + const unsigned long pfn = action->remap.start_pfn;
> + const unsigned long size = action->remap.size;
> + const pgprot_t prot = action->remap.pgprot;
> +
> + return do_remap_pfn_range(vma, start, pfn, size, prot);
> }
>
> /**
> diff --git a/mm/util.c b/mm/util.c
> index ce7ae80047cf..dba1191725b6 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1163,43 +1163,6 @@ void flush_dcache_folio(struct folio *folio)
> EXPORT_SYMBOL(flush_dcache_folio);
> #endif
>
> -/**
> - * __compat_vma_mmap() - See description for compat_vma_mmap()
> - * for details. This is the same operation, only with a specific file operations
> - * struct which may or may not be the same as vma->vm_file->f_op.
> - * @f_op: The file operations whose .mmap_prepare() hook is specified.
> - * @file: The file which backs or will back the mapping.
> - * @vma: The VMA to apply the .mmap_prepare() hook to.
> - * Returns: 0 on success or error.
> - */
> -int __compat_vma_mmap(const struct file_operations *f_op,
> - struct file *file, struct vm_area_struct *vma)
> -{
> - struct vm_area_desc desc = {
> - .mm = vma->vm_mm,
> - .file = file,
> - .start = vma->vm_start,
> - .end = vma->vm_end,
> -
> - .pgoff = vma->vm_pgoff,
> - .vm_file = vma->vm_file,
> - .vma_flags = vma->flags,
> - .page_prot = vma->vm_page_prot,
> -
> - .action.type = MMAP_NOTHING, /* Default */
> - };
> - int err;
> -
> - err = f_op->mmap_prepare(&desc);
> - if (err)
> - return err;
> -
> - mmap_action_prepare(&desc.action, &desc);
> - set_vma_from_desc(vma, &desc);
> - return mmap_action_complete(&desc.action, vma);
> -}
> -EXPORT_SYMBOL(__compat_vma_mmap);
> -
> /**
> * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> * existing VMA and execute any requested actions.
> @@ -1228,7 +1191,31 @@ EXPORT_SYMBOL(__compat_vma_mmap);
> */
> int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> {
> - return __compat_vma_mmap(file->f_op, file, vma);
> + struct vm_area_desc desc = {
> + .mm = vma->vm_mm,
> + .file = file,
> + .start = vma->vm_start,
> + .end = vma->vm_end,
> +
> + .pgoff = vma->vm_pgoff,
> + .vm_file = vma->vm_file,
> + .vma_flags = vma->flags,
> + .page_prot = vma->vm_page_prot,
> +
> + .action.type = MMAP_NOTHING, /* Default */
> + };
> + int err;
> +
> + err = vfs_mmap_prepare(file, &desc);
> + if (err)
> + return err;
> +
> + err = mmap_action_prepare(&desc, &desc.action);
> + if (err)
> + return err;
> +
> + set_vma_from_desc(vma, &desc);
> + return mmap_action_complete(vma, &desc.action);
> }
> EXPORT_SYMBOL(compat_vma_mmap);
>
> @@ -1320,8 +1307,8 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
> }
> }
>
> -static int mmap_action_finish(struct mmap_action *action,
> - const struct vm_area_struct *vma, int err)
> +static int mmap_action_finish(struct vm_area_struct *vma,
> + struct mmap_action *action, int err)
> {
> /*
> * If an error occurs, unmap the VMA altogether and return an error. We
> @@ -1355,35 +1342,36 @@ static int mmap_action_finish(struct mmap_action *action,
> * action which need to be performed.
> * @desc: The VMA descriptor to prepare for @action.
> * @action: The action to perform.
> + *
> + * Returns: 0 on success, otherwise error.
> */
> -void mmap_action_prepare(struct mmap_action *action,
> - struct vm_area_desc *desc)
> +int mmap_action_prepare(struct vm_area_desc *desc,
> + struct mmap_action *action)
Any reason you are swapping the arguments?
It also looks like we always call mmap_action_prepare() with action ==
desc->action, like this: mmap_action_prepare(&desc.action, &desc). Why
don't we eliminate the action parameter altogether and use desc.action
from inside the function?
> +
extra new line.
> {
> switch (action->type) {
> case MMAP_NOTHING:
> - break;
> + return 0;
> case MMAP_REMAP_PFN:
> - remap_pfn_range_prepare(desc, action->remap.start_pfn);
> - break;
> + return remap_pfn_range_prepare(desc, action);
> case MMAP_IO_REMAP_PFN:
> - io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
> - action->remap.size);
> - break;
> + return io_remap_pfn_range_prepare(desc, action);
> }
> }
> EXPORT_SYMBOL(mmap_action_prepare);
>
> /**
> * mmap_action_complete - Execute VMA descriptor action.
> - * @action: The action to perform.
> * @vma: The VMA to perform the action upon.
> + * @action: The action to perform.
> *
> * Similar to mmap_action_prepare().
> *
> * Return: 0 on success, or error, at which point the VMA will be unmapped.
> */
> -int mmap_action_complete(struct mmap_action *action,
> - struct vm_area_struct *vma)
> +int mmap_action_complete(struct vm_area_struct *vma,
> + struct mmap_action *action)
> +
> {
> int err = 0;
>
> @@ -1391,23 +1379,19 @@ int mmap_action_complete(struct mmap_action *action,
> case MMAP_NOTHING:
> break;
> case MMAP_REMAP_PFN:
> - err = remap_pfn_range_complete(vma, action->remap.start,
> - action->remap.start_pfn, action->remap.size,
> - action->remap.pgprot);
> + err = remap_pfn_range_complete(vma, action);
> break;
> case MMAP_IO_REMAP_PFN:
> - err = io_remap_pfn_range_complete(vma, action->remap.start,
> - action->remap.start_pfn, action->remap.size,
> - action->remap.pgprot);
> + err = io_remap_pfn_range_complete(vma, action);
> break;
> }
>
> - return mmap_action_finish(action, vma, err);
> + return mmap_action_finish(vma, action, err);
> }
> EXPORT_SYMBOL(mmap_action_complete);
> #else
> -void mmap_action_prepare(struct mmap_action *action,
> - struct vm_area_desc *desc)
> +int mmap_action_prepare(struct vm_area_desc *desc,
> + struct mmap_action *action)
> {
> switch (action->type) {
> case MMAP_NOTHING:
> @@ -1417,11 +1401,13 @@ void mmap_action_prepare(struct mmap_action *action,
> WARN_ON_ONCE(1); /* nommu cannot handle these. */
> break;
> }
> +
> + return 0;
> }
> EXPORT_SYMBOL(mmap_action_prepare);
>
> -int mmap_action_complete(struct mmap_action *action,
> - struct vm_area_struct *vma)
> +int mmap_action_complete(struct vm_area_struct *vma,
> + struct mmap_action *action)
> {
> int err = 0;
>
> @@ -1436,7 +1422,7 @@ int mmap_action_complete(struct mmap_action *action,
> break;
> }
>
> - return mmap_action_finish(action, vma, err);
> + return mmap_action_finish(vma, action, err);
> }
> EXPORT_SYMBOL(mmap_action_complete);
> #endif
> diff --git a/mm/vma.c b/mm/vma.c
> index be64f781a3aa..054cf1d262fb 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2613,15 +2613,19 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
> vma_set_page_prot(vma);
> }
>
> -static void call_action_prepare(struct mmap_state *map,
> - struct vm_area_desc *desc)
> +static int call_action_prepare(struct mmap_state *map,
> + struct vm_area_desc *desc)
> {
> struct mmap_action *action = &desc->action;
> + int err;
>
> - mmap_action_prepare(action, desc);
> + err = mmap_action_prepare(desc, action);
> + if (err)
> + return err;
>
> if (action->hide_from_rmap_until_complete)
> map->hold_file_rmap_lock = true;
> + return 0;
> }
>
> /*
> @@ -2645,7 +2649,9 @@ static int call_mmap_prepare(struct mmap_state *map,
> if (err)
> return err;
>
> - call_action_prepare(map, desc);
> + err = call_action_prepare(map, desc);
> + if (err)
> + return err;
>
> /* Update fields permitted to be changed. */
> map->pgoff = desc->pgoff;
> @@ -2700,13 +2706,12 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> }
>
> static int call_action_complete(struct mmap_state *map,
> - struct vm_area_desc *desc,
> + struct mmap_action *action,
> struct vm_area_struct *vma)
> {
> - struct mmap_action *action = &desc->action;
> int ret;
>
> - ret = mmap_action_complete(action, vma);
> + ret = mmap_action_complete(vma, action);
>
> /* If we held the file rmap we need to release it. */
> if (map->hold_file_rmap_lock) {
> @@ -2768,7 +2773,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> __mmap_complete(&map, vma);
>
> if (have_mmap_prepare && allocated_new) {
> - error = call_action_complete(&map, &desc, vma);
> + error = call_action_complete(&map, &desc.action, vma);
>
> if (error)
> return error;
> diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> index 5eb313beb43d..908beb263307 100644
> --- a/tools/testing/vma/include/dup.h
> +++ b/tools/testing/vma/include/dup.h
> @@ -1106,7 +1106,7 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
>
> .pgoff = vma->vm_pgoff,
> .vm_file = vma->vm_file,
> - .vm_flags = vma->vm_flags,
> + .vma_flags = vma->flags,
> .page_prot = vma->vm_page_prot,
>
> .action.type = MMAP_NOTHING, /* Default */
> @@ -1117,9 +1117,12 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
> if (err)
> return err;
>
> - mmap_action_prepare(&desc.action, &desc);
> + err = mmap_action_prepare(&desc, &desc.action);
> + if (err)
> + return err;
> +
> set_vma_from_desc(vma, &desc);
> - return mmap_action_complete(&desc.action, vma);
> + return mmap_action_complete(vma, &desc.action);
> }
>
> static inline int compat_vma_mmap(struct file *file,
> diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/stubs.h
> index 947a3a0c2566..76c4b668bc62 100644
> --- a/tools/testing/vma/include/stubs.h
> +++ b/tools/testing/vma/include/stubs.h
> @@ -81,13 +81,14 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
> {
> }
>
> -static inline void mmap_action_prepare(struct mmap_action *action,
> - struct vm_area_desc *desc)
> +static inline int mmap_action_prepare(struct vm_area_desc *desc,
> + struct mmap_action *action)
> {
> + return 0;
> }
>
> -static inline int mmap_action_complete(struct mmap_action *action,
> - struct vm_area_struct *vma)
> +static inline int mmap_action_complete(struct vm_area_struct *vma,
> + struct mmap_action *action)
> {
> return 0;
> }
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
2026-03-15 22:56 ` Suren Baghdasaryan
@ 2026-03-15 23:06 ` Suren Baghdasaryan
2026-03-16 14:47 ` Lorenzo Stoakes (Oracle)
2026-03-16 14:44 ` Lorenzo Stoakes (Oracle)
1 sibling, 1 reply; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-15 23:06 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Sun, Mar 15, 2026 at 3:56 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > Rather than passing arbitrary fields, pass an mmap_action field directly to
> > mmap prepare and complete helpers to put all the action-specific logic in
> > the function actually doing the work.
> >
> > Additionally, allow mmap prepare functions to return an error so we can
> > error out as soon as possible if there is something logically incorrect in
> > the input.
> >
> > Update remap_pfn_range_prepare() to properly check the input range for the
> > CoW case.
>
> By "properly check" do you mean the replacement of desc->start and
> desc->end with action->remap.start and action->remap.start +
> action->remap.size when calling get_remap_pgoff() from
> remap_pfn_range_prepare()?
>
> >
> > While we're here, make remap_pfn_range_prepare_vma() a little neater, and
> > pass mmap_action directly to call_action_complete().
> >
> > Then, update compat_vma_mmap() to perform its logic directly, as
> > __compat_vma_map() is not used by anything so we don't need to export it.
>
> Not directly related to this patch but while reviewing, I was also
> checking vma locking rules in this mmap_prepare() + mmap() sequence
> and I noticed that the new VMA flag modification functions like
> vma_set_flags_mask() do assert vma_assert_locked(vma). It would be
> useful to add these but as a separate change. I will add it to my todo
> list.
>
> >
> > Also update compat_vma_mmap() to use vfs_mmap_prepare() rather than calling
> > the mmap_prepare op directly.
> >
> > Finally, update the VMA userland tests to reflect the changes.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> > include/linux/fs.h | 2 -
> > include/linux/mm.h | 8 +--
> > mm/internal.h | 28 +++++---
> > mm/memory.c | 45 +++++++-----
> > mm/util.c | 112 +++++++++++++-----------------
> > mm/vma.c | 21 +++---
> > tools/testing/vma/include/dup.h | 9 ++-
> > tools/testing/vma/include/stubs.h | 9 +--
> > 8 files changed, 123 insertions(+), 111 deletions(-)
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 8b3dd145b25e..a2628a12bd2b 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2058,8 +2058,6 @@ static inline bool can_mmap_file(struct file *file)
> > return true;
> > }
> >
> > -int __compat_vma_mmap(const struct file_operations *f_op,
> > - struct file *file, struct vm_area_struct *vma);
> > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> >
> > static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 4c4fd55fc823..cc5960a84382 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4116,10 +4116,10 @@ static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
> > mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
> > }
> >
> > -void mmap_action_prepare(struct mmap_action *action,
> > - struct vm_area_desc *desc);
> > -int mmap_action_complete(struct mmap_action *action,
> > - struct vm_area_struct *vma);
> > +int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action);
> > +int mmap_action_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action);
> >
> > /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
> > static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 95b583e7e4f7..7bfa85b5e78b 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1775,26 +1775,32 @@ int walk_page_range_debug(struct mm_struct *mm, unsigned long start,
> > void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm);
> > int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm);
> >
> > -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
> > -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> > - unsigned long pfn, unsigned long size, pgprot_t pgprot);
> > +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action);
> > +int remap_pfn_range_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action);
> >
> > -static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> > - unsigned long orig_pfn, unsigned long size)
> > +static inline int io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > {
> > + const unsigned long orig_pfn = action->remap.start_pfn;
> > + const unsigned long size = action->remap.size;
> > const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> >
> > - return remap_pfn_range_prepare(desc, pfn);
> > + action->remap.start_pfn = pfn;
> > + return remap_pfn_range_prepare(desc, action);
> > }
> >
> > static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
> > - unsigned long addr, unsigned long orig_pfn, unsigned long size,
> > - pgprot_t orig_prot)
> > + struct mmap_action *action)
> > {
> > - const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > - const pgprot_t prot = pgprot_decrypted(orig_prot);
> > + const unsigned long size = action->remap.size;
> > + const unsigned long orig_pfn = action->remap.start_pfn;
> > + const pgprot_t orig_prot = vma->vm_page_prot;
> >
> > - return remap_pfn_range_complete(vma, addr, pfn, size, prot);
> > + action->remap.pgprot = pgprot_decrypted(orig_prot);
I'm guessing it doesn't really matter but after this change
action->remap.pgprot will store the decrypted value while before this
change it was kept the way mmap_prepare() originally set it. We pass
the action structure later to mmap_action_finish() but it does not use
action->remap.pgprot, so this probably doesn't matter.
> > + action->remap.start_pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > + return remap_pfn_range_complete(vma, action);
> > }
> >
> > #ifdef CONFIG_MMU_NOTIFIER
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 6aa0ea4af1fc..364fa8a45360 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3099,26 +3099,34 @@ static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> > }
> > #endif
> >
> > -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
> > +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > {
> > - /*
> > - * We set addr=VMA start, end=VMA end here, so this won't fail, but we
> > - * check it again on complete and will fail there if specified addr is
> > - * invalid.
> > - */
> > - get_remap_pgoff(vma_desc_is_cow_mapping(desc), desc->start, desc->end,
> > - desc->start, desc->end, pfn, &desc->pgoff);
> > + const unsigned long start = action->remap.start;
> > + const unsigned long end = start + action->remap.size;
> > + const unsigned long pfn = action->remap.start_pfn;
> > + const bool is_cow = vma_desc_is_cow_mapping(desc);
>
> I was trying to figure out who sets action->remap.start and
> action->remap.size and if they somehow guaranteed to be always equal
> to desc->start and (desc->end - desc->start). My understanding is that
> action->remap.start and action->remap.size are set by
> f_op->mmap_prepare() but I'm not sure if they are always the same as
> desc->start and (desc->end - desc->start) and if so, how do we enforce
> that.
>
> > + int err;
> > +
> > + err = get_remap_pgoff(is_cow, start, end, desc->start, desc->end, pfn,
> > + &desc->pgoff);
> > + if (err)
> > + return err;
> > +
> > vma_desc_set_flags_mask(desc, VMA_REMAP_FLAGS);
> > + return 0;
> > }
> >
> > -static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma, unsigned long addr,
> > - unsigned long pfn, unsigned long size)
> > +static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma,
> > + unsigned long addr, unsigned long pfn,
> > + unsigned long size)
> > {
> > - unsigned long end = addr + PAGE_ALIGN(size);
> > + const unsigned long end = addr + PAGE_ALIGN(size);
> > + const bool is_cow = is_cow_mapping(vma->vm_flags);
> > int err;
> >
> > - err = get_remap_pgoff(is_cow_mapping(vma->vm_flags), addr, end,
> > - vma->vm_start, vma->vm_end, pfn, &vma->vm_pgoff);
> > + err = get_remap_pgoff(is_cow, addr, end, vma->vm_start, vma->vm_end,
> > + pfn, &vma->vm_pgoff);
> > if (err)
> > return err;
> >
> > @@ -3151,10 +3159,15 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> > }
> > EXPORT_SYMBOL(remap_pfn_range);
> >
> > -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> > - unsigned long pfn, unsigned long size, pgprot_t prot)
> > +int remap_pfn_range_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action)
> > {
> > - return do_remap_pfn_range(vma, addr, pfn, size, prot);
> > + const unsigned long start = action->remap.start;
> > + const unsigned long pfn = action->remap.start_pfn;
> > + const unsigned long size = action->remap.size;
> > + const pgprot_t prot = action->remap.pgprot;
> > +
> > + return do_remap_pfn_range(vma, start, pfn, size, prot);
> > }
> >
> > /**
> > diff --git a/mm/util.c b/mm/util.c
> > index ce7ae80047cf..dba1191725b6 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -1163,43 +1163,6 @@ void flush_dcache_folio(struct folio *folio)
> > EXPORT_SYMBOL(flush_dcache_folio);
> > #endif
> >
> > -/**
> > - * __compat_vma_mmap() - See description for compat_vma_mmap()
> > - * for details. This is the same operation, only with a specific file operations
> > - * struct which may or may not be the same as vma->vm_file->f_op.
> > - * @f_op: The file operations whose .mmap_prepare() hook is specified.
> > - * @file: The file which backs or will back the mapping.
> > - * @vma: The VMA to apply the .mmap_prepare() hook to.
> > - * Returns: 0 on success or error.
> > - */
> > -int __compat_vma_mmap(const struct file_operations *f_op,
> > - struct file *file, struct vm_area_struct *vma)
> > -{
> > - struct vm_area_desc desc = {
> > - .mm = vma->vm_mm,
> > - .file = file,
> > - .start = vma->vm_start,
> > - .end = vma->vm_end,
> > -
> > - .pgoff = vma->vm_pgoff,
> > - .vm_file = vma->vm_file,
> > - .vma_flags = vma->flags,
> > - .page_prot = vma->vm_page_prot,
> > -
> > - .action.type = MMAP_NOTHING, /* Default */
> > - };
> > - int err;
> > -
> > - err = f_op->mmap_prepare(&desc);
> > - if (err)
> > - return err;
> > -
> > - mmap_action_prepare(&desc.action, &desc);
> > - set_vma_from_desc(vma, &desc);
> > - return mmap_action_complete(&desc.action, vma);
> > -}
> > -EXPORT_SYMBOL(__compat_vma_mmap);
> > -
> > /**
> > * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> > * existing VMA and execute any requested actions.
> > @@ -1228,7 +1191,31 @@ EXPORT_SYMBOL(__compat_vma_mmap);
> > */
> > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > {
> > - return __compat_vma_mmap(file->f_op, file, vma);
> > + struct vm_area_desc desc = {
> > + .mm = vma->vm_mm,
> > + .file = file,
> > + .start = vma->vm_start,
> > + .end = vma->vm_end,
> > +
> > + .pgoff = vma->vm_pgoff,
> > + .vm_file = vma->vm_file,
> > + .vma_flags = vma->flags,
> > + .page_prot = vma->vm_page_prot,
> > +
> > + .action.type = MMAP_NOTHING, /* Default */
> > + };
> > + int err;
> > +
> > + err = vfs_mmap_prepare(file, &desc);
> > + if (err)
> > + return err;
> > +
> > + err = mmap_action_prepare(&desc, &desc.action);
> > + if (err)
> > + return err;
> > +
> > + set_vma_from_desc(vma, &desc);
> > + return mmap_action_complete(vma, &desc.action);
> > }
> > EXPORT_SYMBOL(compat_vma_mmap);
> >
> > @@ -1320,8 +1307,8 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
> > }
> > }
> >
> > -static int mmap_action_finish(struct mmap_action *action,
> > - const struct vm_area_struct *vma, int err)
> > +static int mmap_action_finish(struct vm_area_struct *vma,
> > + struct mmap_action *action, int err)
> > {
> > /*
> > * If an error occurs, unmap the VMA altogether and return an error. We
> > @@ -1355,35 +1342,36 @@ static int mmap_action_finish(struct mmap_action *action,
> > * action which need to be performed.
> > * @desc: The VMA descriptor to prepare for @action.
> > * @action: The action to perform.
> > + *
> > + * Returns: 0 on success, otherwise error.
> > */
> > -void mmap_action_prepare(struct mmap_action *action,
> > - struct vm_area_desc *desc)
> > +int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
>
> Any reason you are swapping the arguments?
> It also looks like we always call mmap_action_prepare() with action ==
> desc->action, like this: mmap_action_prepare(&desc.action, &desc). Why
> don't we eliminate the action parameter altogether and use desc.action
> from inside the function?
>
> > +
>
> extra new line.
>
> > {
> > switch (action->type) {
> > case MMAP_NOTHING:
> > - break;
> > + return 0;
> > case MMAP_REMAP_PFN:
> > - remap_pfn_range_prepare(desc, action->remap.start_pfn);
> > - break;
> > + return remap_pfn_range_prepare(desc, action);
> > case MMAP_IO_REMAP_PFN:
> > - io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
> > - action->remap.size);
> > - break;
> > + return io_remap_pfn_range_prepare(desc, action);
> > }
> > }
> > EXPORT_SYMBOL(mmap_action_prepare);
> >
> > /**
> > * mmap_action_complete - Execute VMA descriptor action.
> > - * @action: The action to perform.
> > * @vma: The VMA to perform the action upon.
> > + * @action: The action to perform.
> > *
> > * Similar to mmap_action_prepare().
> > *
> > * Return: 0 on success, or error, at which point the VMA will be unmapped.
> > */
> > -int mmap_action_complete(struct mmap_action *action,
> > - struct vm_area_struct *vma)
> > +int mmap_action_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action)
> > +
> > {
> > int err = 0;
> >
> > @@ -1391,23 +1379,19 @@ int mmap_action_complete(struct mmap_action *action,
> > case MMAP_NOTHING:
> > break;
> > case MMAP_REMAP_PFN:
> > - err = remap_pfn_range_complete(vma, action->remap.start,
> > - action->remap.start_pfn, action->remap.size,
> > - action->remap.pgprot);
> > + err = remap_pfn_range_complete(vma, action);
> > break;
> > case MMAP_IO_REMAP_PFN:
> > - err = io_remap_pfn_range_complete(vma, action->remap.start,
> > - action->remap.start_pfn, action->remap.size,
> > - action->remap.pgprot);
> > + err = io_remap_pfn_range_complete(vma, action);
> > break;
> > }
> >
> > - return mmap_action_finish(action, vma, err);
> > + return mmap_action_finish(vma, action, err);
> > }
> > EXPORT_SYMBOL(mmap_action_complete);
> > #else
> > -void mmap_action_prepare(struct mmap_action *action,
> > - struct vm_area_desc *desc)
> > +int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > {
> > switch (action->type) {
> > case MMAP_NOTHING:
> > @@ -1417,11 +1401,13 @@ void mmap_action_prepare(struct mmap_action *action,
> > WARN_ON_ONCE(1); /* nommu cannot handle these. */
> > break;
> > }
> > +
> > + return 0;
> > }
> > EXPORT_SYMBOL(mmap_action_prepare);
> >
> > -int mmap_action_complete(struct mmap_action *action,
> > - struct vm_area_struct *vma)
> > +int mmap_action_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action)
> > {
> > int err = 0;
> >
> > @@ -1436,7 +1422,7 @@ int mmap_action_complete(struct mmap_action *action,
> > break;
> > }
> >
> > - return mmap_action_finish(action, vma, err);
> > + return mmap_action_finish(vma, action, err);
> > }
> > EXPORT_SYMBOL(mmap_action_complete);
> > #endif
> > diff --git a/mm/vma.c b/mm/vma.c
> > index be64f781a3aa..054cf1d262fb 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -2613,15 +2613,19 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
> > vma_set_page_prot(vma);
> > }
> >
> > -static void call_action_prepare(struct mmap_state *map,
> > - struct vm_area_desc *desc)
> > +static int call_action_prepare(struct mmap_state *map,
> > + struct vm_area_desc *desc)
> > {
> > struct mmap_action *action = &desc->action;
> > + int err;
> >
> > - mmap_action_prepare(action, desc);
> > + err = mmap_action_prepare(desc, action);
> > + if (err)
> > + return err;
> >
> > if (action->hide_from_rmap_until_complete)
> > map->hold_file_rmap_lock = true;
> > + return 0;
> > }
> >
> > /*
> > @@ -2645,7 +2649,9 @@ static int call_mmap_prepare(struct mmap_state *map,
> > if (err)
> > return err;
> >
> > - call_action_prepare(map, desc);
> > + err = call_action_prepare(map, desc);
> > + if (err)
> > + return err;
> >
> > /* Update fields permitted to be changed. */
> > map->pgoff = desc->pgoff;
> > @@ -2700,13 +2706,12 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> > }
> >
> > static int call_action_complete(struct mmap_state *map,
> > - struct vm_area_desc *desc,
> > + struct mmap_action *action,
> > struct vm_area_struct *vma)
> > {
> > - struct mmap_action *action = &desc->action;
> > int ret;
> >
> > - ret = mmap_action_complete(action, vma);
> > + ret = mmap_action_complete(vma, action);
> >
> > /* If we held the file rmap we need to release it. */
> > if (map->hold_file_rmap_lock) {
> > @@ -2768,7 +2773,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > __mmap_complete(&map, vma);
> >
> > if (have_mmap_prepare && allocated_new) {
> > - error = call_action_complete(&map, &desc, vma);
> > + error = call_action_complete(&map, &desc.action, vma);
> >
> > if (error)
> > return error;
> > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > index 5eb313beb43d..908beb263307 100644
> > --- a/tools/testing/vma/include/dup.h
> > +++ b/tools/testing/vma/include/dup.h
> > @@ -1106,7 +1106,7 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
> >
> > .pgoff = vma->vm_pgoff,
> > .vm_file = vma->vm_file,
> > - .vm_flags = vma->vm_flags,
> > + .vma_flags = vma->flags,
> > .page_prot = vma->vm_page_prot,
> >
> > .action.type = MMAP_NOTHING, /* Default */
> > @@ -1117,9 +1117,12 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
> > if (err)
> > return err;
> >
> > - mmap_action_prepare(&desc.action, &desc);
> > + err = mmap_action_prepare(&desc, &desc.action);
> > + if (err)
> > + return err;
> > +
> > set_vma_from_desc(vma, &desc);
> > - return mmap_action_complete(&desc.action, vma);
> > + return mmap_action_complete(vma, &desc.action);
> > }
> >
> > static inline int compat_vma_mmap(struct file *file,
> > diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/stubs.h
> > index 947a3a0c2566..76c4b668bc62 100644
> > --- a/tools/testing/vma/include/stubs.h
> > +++ b/tools/testing/vma/include/stubs.h
> > @@ -81,13 +81,14 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
> > {
> > }
> >
> > -static inline void mmap_action_prepare(struct mmap_action *action,
> > - struct vm_area_desc *desc)
> > +static inline int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > {
> > + return 0;
> > }
> >
> > -static inline int mmap_action_complete(struct mmap_action *action,
> > - struct vm_area_struct *vma)
> > +static inline int mmap_action_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action)
> > {
> > return 0;
> > }
> > --
> > 2.53.0
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback
2026-03-12 20:27 ` [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback Lorenzo Stoakes (Oracle)
2026-03-13 0:12 ` Randy Dunlap
@ 2026-03-15 23:23 ` Suren Baghdasaryan
2026-03-16 19:16 ` Lorenzo Stoakes (Oracle)
1 sibling, 1 reply; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-15 23:23 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> This documentation makes it easier for a driver/file system implementer to
> correctly use this callback.
>
> It covers the fundamentals, whilst intentionally leaving the less lovely
> possible actions one might take undocumented (for instance - the
> success_hook, error_hook fields in mmap_action).
>
> The document also covers the new VMA flags implementation which is the only
> one which will work correctly with mmap_prepare.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
> 1 file changed, 131 insertions(+)
> create mode 100644 Documentation/filesystems/mmap_prepare.rst
>
> diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
> new file mode 100644
> index 000000000000..76908200f3a1
> --- /dev/null
> +++ b/Documentation/filesystems/mmap_prepare.rst
> @@ -0,0 +1,131 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========================
> +mmap_prepare callback HOWTO
> +===========================
> +
> +Introduction
> +############
> +
> +The `struct file->f_op->mmap()` callback has been deprecated as it is both a
> +stability and security risk, and doesn't always permit the merging of adjacent
> +mappings resulting in unnecessary memory fragmentation.
> +
> +It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
> +these problems.
> +
> +## How To Use
> +
> +In your driver's `struct file_operations` struct, specify an `mmap_prepare`
> +callback rather than an `mmap` one, e.g. for ext4:
> +
> +
> +.. code-block:: C
> +
> + const struct file_operations ext4_file_operations = {
> + ...
> + .mmap_prepare = ext4_file_mmap_prepare,
> + };
> +
> +This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
> +
> +Examining the `struct vm_area_desc` type:
> +
> +.. code-block:: C
> +
> + struct vm_area_desc {
> + /* Immutable state. */
> + const struct mm_struct *const mm;
> + struct file *const file; /* May vary from vm_file in stacked callers. */
> + unsigned long start;
> + unsigned long end;
> +
> + /* Mutable fields. Populated with initial state. */
> + pgoff_t pgoff;
> + struct file *vm_file;
> + vma_flags_t vma_flags;
> + pgprot_t page_prot;
> +
> + /* Write-only fields. */
> + const struct vm_operations_struct *vm_ops;
> + void *private_data;
> +
> + /* Take further action? */
> + struct mmap_action action;
So, action still belongs to /* Write-only fields. */ section? This is
nitpicky, but it might be better to have this as:
/* Write-only fields. */
const struct vm_operations_struct *vm_ops;
void *private_data;
struct mmap_action action; /* Take further action? */
> + };
> +
> +This is straightforward - you have all the fields you need to set up the
> +mapping, and you can update the mutable and writable fields, for instance:
> +
> +.. code-block:: Cw
> +
> + static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
> + {
> + int ret;
> + struct file *file = desc->file;
> + struct inode *inode = file->f_mapping->host;
> +
> + ...
> +
> + file_accessed(file);
> + if (IS_DAX(file_inode(file))) {
> + desc->vm_ops = &ext4_dax_vm_ops;
> + vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
> + } else {
> + desc->vm_ops = &ext4_file_vm_ops;
> + }
> + return 0;
> + }
> +
> +Importantly, you no longer have to dance around with reference counts or locks
> +when updating these fields - __you can simply go ahead and change them__.
> +
> +Everything is taken care of by the mapping code.
> +
> +VMA Flags
> +=========
> +
> +Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
> +you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
> +`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
> +locking done correctly for you, this is no longer necessary.
> +
> +Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
> +etc. - i.e. using a `VM_xxx` macro has changed too.
> +
> +When implementing `mmap_prepare()`, reference flags by their bit number, defined
> +as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
> +of (where `desc` is a pointer to `struct vma_area_desc`):
> +
> +* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
> + wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
> + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
> + otherwise `false`.
> +* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
> + additional flags specified by a comma-separated list,
> + e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
> +* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
> + flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
> + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
> +
> +Actions
> +=======
> +
> +You can now very easily have actions be performed upon a mapping once set up by
> +utilising simple helper functions invoked upon the `struct vm_area_desc`
> +pointer. These are:
> +
> +* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
> + range starting a virtual address and PFN number of a set size.
> +
> +* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
> + entire mapping from `start_pfn` onward.
> +
> +* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
> + remap.
> +
> +* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
> + the entire mapping from `start_pfn` onward.
> +
> +**NOTE:** The 'action' field should never normally be manipulated directly,
> +rather you ought to use one of these helpers.
I'm guessing the start and size parameters passed to
mmap_action_remap() and such are restricted by vm_area_desc.start
vm_area_desc.end. If so, should we document those restrictions and
enforce them in the code?
> + struct vm_area_desc {
> + /* Immutable state. */
> + const struct mm_struct *const mm;
> + struct file *const file; /* May vary from vm_file in stacked callers. */
> + unsigned long start;
> + unsigned long end;
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 03/15] mm: document vm_operations_struct->open the same as close()
2026-03-12 20:27 ` [PATCH 03/15] mm: document vm_operations_struct->open the same as close() Lorenzo Stoakes (Oracle)
@ 2026-03-16 0:43 ` Suren Baghdasaryan
2026-03-16 14:31 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-16 0:43 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> Describe when the operation is invoked and the context in which it is
> invoked, matching the description already added for vm_op->close().
>
> While we're here, update all outdated references to an 'area' field for
> VMAs to the more consistent 'vma'.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> include/linux/mm.h | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cc5960a84382..12a0b4c63736 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -748,15 +748,20 @@ struct vm_uffd_ops;
> * to the functions called when a no-page or a wp-page exception occurs.
> */
> struct vm_operations_struct {
> - void (*open)(struct vm_area_struct * area);
> + /**
> + * @open: Called when a VMA is remapped or split. Not called upon first
> + * mapping a VMA.
It's also called from dup_mmap() which is part of forking.
> + * Context: User context. May sleep. Caller holds mmap_lock.
> + */
> + void (*open)(struct vm_area_struct *vma);
> /**
> * @close: Called when the VMA is being removed from the MM.
> * Context: User context. May sleep. Caller holds mmap_lock.
> */
> - void (*close)(struct vm_area_struct * area);
> + void (*close)(struct vm_area_struct *vma);
> /* Called any time before splitting to check if it's allowed */
> - int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> - int (*mremap)(struct vm_area_struct *area);
> + int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> + int (*mremap)(struct vm_area_struct *vma);
> /*
> * Called by mprotect() to make driver-specific permission
> * checks before mprotect() is finalised. The VMA must not
> @@ -768,7 +773,7 @@ struct vm_operations_struct {
> vm_fault_t (*huge_fault)(struct vm_fault *vmf, unsigned int order);
> vm_fault_t (*map_pages)(struct vm_fault *vmf,
> pgoff_t start_pgoff, pgoff_t end_pgoff);
> - unsigned long (*pagesize)(struct vm_area_struct * area);
> + unsigned long (*pagesize)(struct vm_area_struct *vma);
>
> /* notification that a previously read-only page is about to become
> * writable, if an error is returned it will cause a SIGBUS */
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 04/15] mm: add vm_ops->mapped hook
2026-03-13 11:58 ` Lorenzo Stoakes (Oracle)
@ 2026-03-16 2:18 ` Suren Baghdasaryan
2026-03-16 13:39 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-16 2:18 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Fri, Mar 13, 2026 at 4:58 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Fri, Mar 13, 2026 at 04:02:36AM -0700, Usama Arif wrote:
> > On Thu, 12 Mar 2026 20:27:19 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> >
> > > Previously, when a driver needed to do something like establish a reference
> > > count, it could do so in the mmap hook in the knowledge that the mapping
> > > would succeed.
> > >
> > > With the introduction of f_op->mmap_prepare this is no longer the case, as
> > > it is invoked prior to actually establishing the mapping.
> > >
> > > To take this into account, introduce a new vm_ops->mapped callback which is
> > > invoked when the VMA is first mapped (though notably - not when it is
> > > merged - which is correct and mirrors existing mmap/open/close behaviour).
> > >
> > > We do better that vm_ops->open() here, as this callback can return an
> > > error, at which point the VMA will be unmapped.
> > >
> > > Note that vm_ops->mapped() is invoked after any mmap action is
> > > complete (such as I/O remapping).
> > >
> > > We intentionally do not expose the VMA at this point, exposing only the
> > > fields that could be used, and an output parameter in case the operation
> > > needs to update the vma->vm_private_data field.
> > >
> > > In order to deal with stacked filesystems which invoke inner filesystem's
> > > mmap() invocations, add __compat_vma_mapped() and invoke it on
> > > vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
> > > handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
> > > callback.
> > >
> > > We can now also remove call_action_complete() and invoke
> > > mmap_action_complete() directly, as we separate out the rmap lock logic to
> > > be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
> > >
> > > We also abstract unmapping of a VMA on mmap action completion into its own
> > > helper function, unmap_vma_locked().
> > >
> > > Additionally, update VMA userland test headers to reflect the change.
> > >
> > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > ---
> > > include/linux/fs.h | 9 +++-
> > > include/linux/mm.h | 17 +++++++
> > > mm/internal.h | 10 ++++
> > > mm/util.c | 86 ++++++++++++++++++++++++---------
> > > mm/vma.c | 41 +++++++++++-----
> > > tools/testing/vma/include/dup.h | 34 ++++++++++++-
> > > 6 files changed, 158 insertions(+), 39 deletions(-)
> > >
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index a2628a12bd2b..c390f5c667e3 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
> > > }
> > >
> > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> > > +int __vma_check_mmap_hook(struct vm_area_struct *vma);
> > >
> > > static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> > > {
> > > + int err;
> > > +
> > > if (file->f_op->mmap_prepare)
> > > return compat_vma_mmap(file, vma);
> > >
> > > - return file->f_op->mmap(file, vma);
> > > + err = file->f_op->mmap(file, vma);
> > > + if (err)
> > > + return err;
> > > +
> > > + return __vma_check_mmap_hook(vma);
> > > }
> > >
> > > static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 12a0b4c63736..7333d5db1221 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -759,6 +759,23 @@ struct vm_operations_struct {
> > > * Context: User context. May sleep. Caller holds mmap_lock.
> > > */
> > > void (*close)(struct vm_area_struct *vma);
> > > + /**
> > > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > > + * the new VMA is merged with an adjacent VMA.
> > > + *
> > > + * The @vm_private_data field is an output field allowing the user to
> > > + * modify vma->vm_private_data as necessary.
> > > + *
> > > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > > + * set from f_op->mmap.
> > > + *
> > > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > > + * be unmapped.
> > > + *
> > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > + */
> > > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > + const struct file *file, void **vm_private_data);
> > > /* Called any time before splitting to check if it's allowed */
> > > int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> > > int (*mremap)(struct vm_area_struct *vma);
> > > diff --git a/mm/internal.h b/mm/internal.h
> > > index 7bfa85b5e78b..f0f2cf1caa36 100644
> > > --- a/mm/internal.h
> > > +++ b/mm/internal.h
> > > @@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
> > > * mmap hook and safely handle error conditions. On error, VMA hooks will be
> > > * mutated.
> > > *
> > > + * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
> > > + *
What exactly would one do to "prefer f_op->mmap_prepare()"?
Since you are adding this comment for mmap_file(), I think you need to
describe more specifically what one should call instead.
> > > * @file: File which backs the mapping.
> > > * @vma: VMA which we are mapping.
> > > *
> > > @@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
> > > /* unmap_vmas is in mm/memory.c */
> > > void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
> > >
> > > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > > +{
> > > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > +
> > > + mmap_assert_locked(vma->vm_mm);
You must hold the mmap write lock when unmapping. Would be better to
assert mmap_assert_write_locked() or even vma_assert_write_locked(),
which implies mmap_assert_write_locked().
> > > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > > +}
> > > +
> > > #ifdef CONFIG_MMU
> > >
> > > static inline void get_anon_vma(struct anon_vma *anon_vma)
> > > diff --git a/mm/util.c b/mm/util.c
> > > index dba1191725b6..2b0ed54008d6 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
> > > EXPORT_SYMBOL(flush_dcache_folio);
> > > #endif
> > >
> > > +static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > +{
> > > + struct vm_area_desc desc = {
> > > + .mm = vma->vm_mm,
> > > + .file = file,
> > > + .start = vma->vm_start,
> > > + .end = vma->vm_end,
> > > +
> > > + .pgoff = vma->vm_pgoff,
> > > + .vm_file = vma->vm_file,
> > > + .vma_flags = vma->flags,
> > > + .page_prot = vma->vm_page_prot,
> > > +
> > > + .action.type = MMAP_NOTHING, /* Default */
> > > + };
> > > + int err;
> > > +
> > > + err = vfs_mmap_prepare(file, &desc);
> > > + if (err)
> > > + return err;
> > > +
> > > + err = mmap_action_prepare(&desc, &desc.action);
> > > + if (err)
> > > + return err;
> > > +
> > > + set_vma_from_desc(vma, &desc);
> > > + return mmap_action_complete(vma, &desc.action);
> > > +}
> > > +
> > > +static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
> > > +{
> > > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > > + void *vm_private_data = vma->vm_private_data;
> > > + int err;
> > > +
> > > + if (!vm_ops->mapped)
> > > + return 0;
> > > +
> >
> > Hello!
> >
> > Can vm_ops be NULL here? __compat_vma_mapped() is called from
> > compat_vma_mmap(), which is reached when a filesystem provides
> > mmap_prepare. If the mmap_prepare hook does not set desc->vm_ops,
> > vma->vm_ops will be NULL and this dereferences a NULL pointer.
>
> I _think_ for this to ever be invoked, you would need to be dealing with a
> file-backed VMA so vm_ops->fault would HAVE to be defined.
>
> But you're right anyway as a matter of principle we should check it! Will fix.
>
> >
> > For e.g. drivers/char/mem.c, mmap_zero_prepare() would trigger
> > a NULL pointer dereference here.
> >
> > Would need to do
> > if (!vm_ops || !vm_ops->mapped)
> > return 0;
> >
> > here
>
> Yes.
>
> >
> >
> > > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
> > > + &vm_private_data);
> > > + if (err)
> > > + unmap_vma_locked(vma);
> >
> > when mapped() returns an error, unmap_vma_locked(vma) is called
> > but execution continues into the vm_private_data update below. After
> > unmap_vma_locked() the VMA may be freed (do_munmap can remove the VMA
> > entirely), so accessing vma->vm_private_data after that is a
> > use-after-free.
>
> Very good point :) will fix thanks!
>
> Probably:
>
> if (err)
> unmap_vma_locked(vma);
> else if (vm_private_data != vma->vm_private_data)
> vma->vm_private_data = vm_private_data;
>
> return err;
>
> Would be fine.
>
> >
> > Probably need to do:
> > if (err) {
> > unmap_vma_locked(vma);
> > return err;
> > }
> >
> > > + /* Update private data if changed. */
> > > + if (vm_private_data != vma->vm_private_data)
> > > + vma->vm_private_data = vm_private_data;
> > > +
> > > + return err;
> > > +}
> > > +
> > > /**
> > > * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> > > * existing VMA and execute any requested actions.
> > > @@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
> > > */
> > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > {
> > > - struct vm_area_desc desc = {
> > > - .mm = vma->vm_mm,
> > > - .file = file,
> > > - .start = vma->vm_start,
> > > - .end = vma->vm_end,
> > > -
> > > - .pgoff = vma->vm_pgoff,
> > > - .vm_file = vma->vm_file,
> > > - .vma_flags = vma->flags,
> > > - .page_prot = vma->vm_page_prot,
> > > -
> > > - .action.type = MMAP_NOTHING, /* Default */
> > > - };
> > > int err;
> > >
> > > - err = vfs_mmap_prepare(file, &desc);
> > > - if (err)
> > > - return err;
> > > -
> > > - err = mmap_action_prepare(&desc, &desc.action);
> > > + err = __compat_vma_mmap(file, vma);
> > > if (err)
> > > return err;
> > >
> > > - set_vma_from_desc(vma, &desc);
> > > - return mmap_action_complete(vma, &desc.action);
> > > + return __compat_vma_mapped(file, vma);
> > > }
> > > EXPORT_SYMBOL(compat_vma_mmap);
> > >
> > > +int __vma_check_mmap_hook(struct vm_area_struct *vma)
> > > +{
> > > + /* vm_ops->mapped is not valid if mmap() is specified. */
> > > + if (WARN_ON_ONCE(vma->vm_ops->mapped))
> > > + return -EINVAL;
> >
> > I think vma->vm_ops can be NULL here. Should be:
> >
> > if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped))
> > return -EINVAL;
>
> I think again you'd probably only invoke this on file-backed so be ok, but again
> as a matter of principle we should check it so will fix, thanks!
>
> >
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL(__vma_check_mmap_hook);
nit: Any reason __vma_check_mmap_hook() is not inlined next to its
user vfs_mmap()?
> > > +
> > > static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
> > > const struct page *page)
> > > {
> > > @@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
> > > * invoked if we do NOT merge, so we only clean up the VMA we created.
> > > */
> > > if (err) {
> > > - const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > -
> > > - do_munmap(current->mm, vma->vm_start, len, NULL);
> > > -
> > > + unmap_vma_locked(vma);
> > > if (action->error_hook) {
> > > /* We may want to filter the error. */
> > > err = action->error_hook(err);
> > > diff --git a/mm/vma.c b/mm/vma.c
> > > index 054cf1d262fb..ef9f5a5365d1 100644
> > > --- a/mm/vma.c
> > > +++ b/mm/vma.c
> > > @@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> > > return false;
> > > }
> > >
> > > -static int call_action_complete(struct mmap_state *map,
> > > - struct mmap_action *action,
> > > - struct vm_area_struct *vma)
> > > +static int call_mapped_hook(struct vm_area_struct *vma)
> > > {
> > > - int ret;
> > > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > > + void *vm_private_data = vma->vm_private_data;
> > > + int err;
> > >
> > > - ret = mmap_action_complete(vma, action);
> > > + if (!vm_ops || !vm_ops->mapped)
> > > + return 0;
> > > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
> > > + vma->vm_file, &vm_private_data);
> > > + if (err) {
> > > + unmap_vma_locked(vma);
> > > + return err;
> > > + }
> > > + /* Update private data if changed. */
> > > + if (vm_private_data != vma->vm_private_data)
> > > + vma->vm_private_data = vm_private_data;
> > > + return 0;
> > > +}
> > >
> > > - /* If we held the file rmap we need to release it. */
> > > - if (map->hold_file_rmap_lock) {
> > > - struct file *file = vma->vm_file;
> > > +static void maybe_drop_file_rmap_lock(struct mmap_state *map,
> > > + struct vm_area_struct *vma)
> > > +{
> > > + struct file *file;
> > >
> > > - i_mmap_unlock_write(file->f_mapping);
> > > - }
> > > - return ret;
> > > + if (!map->hold_file_rmap_lock)
> > > + return;
> > > + file = vma->vm_file;
> > > + i_mmap_unlock_write(file->f_mapping);
> > > }
> > >
> > > static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > @@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > __mmap_complete(&map, vma);
> > >
> > > if (have_mmap_prepare && allocated_new) {
> > > - error = call_action_complete(&map, &desc.action, vma);
> > > + error = mmap_action_complete(vma, &desc.action);
> > > + if (!error)
> > > + error = call_mapped_hook(vma);
> > >
> > > + maybe_drop_file_rmap_lock(&map, vma);
> > > if (error)
> > > return error;
> > > }
> > > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > > index 908beb263307..47d8db809f31 100644
> > > --- a/tools/testing/vma/include/dup.h
> > > +++ b/tools/testing/vma/include/dup.h
> > > @@ -606,12 +606,34 @@ struct vm_area_struct {
> > > } __randomize_layout;
> > >
> > > struct vm_operations_struct {
> > > - void (*open)(struct vm_area_struct * area);
> > > + /**
> > > + * @open: Called when a VMA is remapped or split. Not called upon first
> > > + * mapping a VMA.
> > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > + */
This comment should have been introduced in the previous patch.
> > > + void (*open)(struct vm_area_struct *vma);
> > > /**
> > > * @close: Called when the VMA is being removed from the MM.
> > > * Context: User context. May sleep. Caller holds mmap_lock.
> > > */
> > > - void (*close)(struct vm_area_struct * area);
> > > + void (*close)(struct vm_area_struct *vma);
> > > + /**
> > > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > > + * the new VMA is merged with an adjacent VMA.
> > > + *
> > > + * The @vm_private_data field is an output field allowing the user to
> > > + * modify vma->vm_private_data as necessary.
> > > + *
> > > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > > + * set from f_op->mmap.
> > > + *
> > > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > > + * be unmapped.
> > > + *
> > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > + */
> > > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > + const struct file *file, void **vm_private_data);
> > > /* Called any time before splitting to check if it's allowed */
> > > int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> > > int (*mremap)(struct vm_area_struct *area);
> > > @@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
> > > swap(vma->vm_file, file);
> > > fput(file);
> > > }
> > > +
> > > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > > +{
> > > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > +
> > > + mmap_assert_locked(vma->vm_mm);
> > > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > > +}
> > > --
> > > 2.53.0
> > >
> > >
>
> Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
2026-03-13 12:00 ` Lorenzo Stoakes (Oracle)
@ 2026-03-16 2:32 ` Suren Baghdasaryan
2026-03-16 14:29 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-16 2:32 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Fri, Mar 13, 2026 at 5:00 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> > On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> >
> > > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > > the deprecated mmap callback.
> > >
> > > However, it did not account for the fact that mmap_prepare can fail to map
> > > due to an out of memory error, and thus should not be incrementing a
> > > reference count on mmap_prepare.
This is a bit confusing. I see the current implementation does
afs_add_open_mmap() and then if generic_file_mmap_prepare() fails it
does afs_drop_open_mmap(), therefore refcounting seems to be balanced.
Is there really a problem?
> > >
> > > With the newly added vm_ops->mapped callback available, we can simply defer
> > > this operation to that callback which is only invoked once the mapping is
> > > successfully in place (but not yet visible to userspace as the mmap and VMA
> > > write locks are held).
> > >
> > > Therefore add afs_mapped() to implement this callback for AFS.
> > >
> > > In practice the mapping allocations are 'too small to fail' so this is
> > > something that realistically should never happen in practice (or would do
> > > so in a case where the process is about to die anyway), but we should still
> > > handle this.
nit: I would drop the above paragraph. If it's impossible why are you
handling it? If it's unlikely, then handling it is even more
important.
> > >
> > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > ---
> > > fs/afs/file.c | 20 ++++++++++++++++----
> > > 1 file changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > > index f609366fd2ac..69ef86f5e274 100644
> > > --- a/fs/afs/file.c
> > > +++ b/fs/afs/file.c
> > > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> > > static void afs_vm_open(struct vm_area_struct *area);
> > > static void afs_vm_close(struct vm_area_struct *area);
> > > static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > + const struct file *file, void **vm_private_data);
> > >
> > > const struct file_operations afs_file_operations = {
> > > .open = afs_open,
> > > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> > > };
> > >
> > > static const struct vm_operations_struct afs_vm_ops = {
> > > + .mapped = afs_mapped,
> > > .open = afs_vm_open,
> > > .close = afs_vm_close,
> > > .fault = filemap_fault,
> > > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> > > afs_add_open_mmap(vnode);
> >
> > Is the above afs_add_open_mmap an additional one, which could cause a reference
> > leak? Does the above one need to be removed and only the one in afs_mapped()
> > needs to be kept?
>
> Ah yeah good spot, will fix thanks!
>
> >
> > >
> > > ret = generic_file_mmap_prepare(desc);
> > > - if (ret == 0)
> > > - desc->vm_ops = &afs_vm_ops;
> > > - else
> > > - afs_drop_open_mmap(vnode);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + desc->vm_ops = &afs_vm_ops;
> > > return ret;
> > > }
> > >
> > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > + const struct file *file, void **vm_private_data)
> > > +{
> > > + struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > > +
> > > + afs_add_open_mmap(vnode);
> > > + return 0;
> > > +}
> > > +
> > > static void afs_vm_open(struct vm_area_struct *vma)
> > > {
> > > afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > > --
> > > 2.53.0
> > >
> > >
>
> Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 04/15] mm: add vm_ops->mapped hook
2026-03-16 2:18 ` Suren Baghdasaryan
@ 2026-03-16 13:39 ` Lorenzo Stoakes (Oracle)
2026-03-16 23:39 ` Suren Baghdasaryan
0 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-16 13:39 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Sun, Mar 15, 2026 at 07:18:38PM -0700, Suren Baghdasaryan wrote:
> On Fri, Mar 13, 2026 at 4:58 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > On Fri, Mar 13, 2026 at 04:02:36AM -0700, Usama Arif wrote:
> > > On Thu, 12 Mar 2026 20:27:19 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > >
> > > > Previously, when a driver needed to do something like establish a reference
> > > > count, it could do so in the mmap hook in the knowledge that the mapping
> > > > would succeed.
> > > >
> > > > With the introduction of f_op->mmap_prepare this is no longer the case, as
> > > > it is invoked prior to actually establishing the mapping.
> > > >
> > > > To take this into account, introduce a new vm_ops->mapped callback which is
> > > > invoked when the VMA is first mapped (though notably - not when it is
> > > > merged - which is correct and mirrors existing mmap/open/close behaviour).
> > > >
> > > > We do better that vm_ops->open() here, as this callback can return an
> > > > error, at which point the VMA will be unmapped.
> > > >
> > > > Note that vm_ops->mapped() is invoked after any mmap action is
> > > > complete (such as I/O remapping).
> > > >
> > > > We intentionally do not expose the VMA at this point, exposing only the
> > > > fields that could be used, and an output parameter in case the operation
> > > > needs to update the vma->vm_private_data field.
> > > >
> > > > In order to deal with stacked filesystems which invoke inner filesystem's
> > > > mmap() invocations, add __compat_vma_mapped() and invoke it on
> > > > vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
> > > > handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
> > > > callback.
> > > >
> > > > We can now also remove call_action_complete() and invoke
> > > > mmap_action_complete() directly, as we separate out the rmap lock logic to
> > > > be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
> > > >
> > > > We also abstract unmapping of a VMA on mmap action completion into its own
> > > > helper function, unmap_vma_locked().
> > > >
> > > > Additionally, update VMA userland test headers to reflect the change.
> > > >
> > > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > > ---
> > > > include/linux/fs.h | 9 +++-
> > > > include/linux/mm.h | 17 +++++++
> > > > mm/internal.h | 10 ++++
> > > > mm/util.c | 86 ++++++++++++++++++++++++---------
> > > > mm/vma.c | 41 +++++++++++-----
> > > > tools/testing/vma/include/dup.h | 34 ++++++++++++-
> > > > 6 files changed, 158 insertions(+), 39 deletions(-)
> > > >
> > > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > > index a2628a12bd2b..c390f5c667e3 100644
> > > > --- a/include/linux/fs.h
> > > > +++ b/include/linux/fs.h
> > > > @@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
> > > > }
> > > >
> > > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> > > > +int __vma_check_mmap_hook(struct vm_area_struct *vma);
> > > >
> > > > static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> > > > {
> > > > + int err;
> > > > +
> > > > if (file->f_op->mmap_prepare)
> > > > return compat_vma_mmap(file, vma);
> > > >
> > > > - return file->f_op->mmap(file, vma);
> > > > + err = file->f_op->mmap(file, vma);
> > > > + if (err)
> > > > + return err;
> > > > +
> > > > + return __vma_check_mmap_hook(vma);
> > > > }
> > > >
> > > > static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
> > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > index 12a0b4c63736..7333d5db1221 100644
> > > > --- a/include/linux/mm.h
> > > > +++ b/include/linux/mm.h
> > > > @@ -759,6 +759,23 @@ struct vm_operations_struct {
> > > > * Context: User context. May sleep. Caller holds mmap_lock.
> > > > */
> > > > void (*close)(struct vm_area_struct *vma);
> > > > + /**
> > > > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > > > + * the new VMA is merged with an adjacent VMA.
> > > > + *
> > > > + * The @vm_private_data field is an output field allowing the user to
> > > > + * modify vma->vm_private_data as necessary.
> > > > + *
> > > > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > > > + * set from f_op->mmap.
> > > > + *
> > > > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > > > + * be unmapped.
> > > > + *
> > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > + */
> > > > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > + const struct file *file, void **vm_private_data);
> > > > /* Called any time before splitting to check if it's allowed */
> > > > int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> > > > int (*mremap)(struct vm_area_struct *vma);
> > > > diff --git a/mm/internal.h b/mm/internal.h
> > > > index 7bfa85b5e78b..f0f2cf1caa36 100644
> > > > --- a/mm/internal.h
> > > > +++ b/mm/internal.h
> > > > @@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
> > > > * mmap hook and safely handle error conditions. On error, VMA hooks will be
> > > > * mutated.
> > > > *
> > > > + * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
> > > > + *
>
> What exactly would one do to "prefer f_op->mmap_prepare()"?
I'm saying a person should implement f_op->mmap_prepare() rather than
f_op->mmap(), since the latter is deprecated :)
I think that's pretty clear no?
> Since you are adding this comment for mmap_file(), I think you need to
> describe more specifically what one should call instead.
I think it'd be a complete distraction, since if you're at the point of calling
mmap_file() you're already not implement mmap_prepare except as a compatbility
layer.
I mean maybe I'll just drop this as it seems to be causing confusion.
>
> > > > * @file: File which backs the mapping.
> > > > * @vma: VMA which we are mapping.
> > > > *
> > > > @@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
> > > > /* unmap_vmas is in mm/memory.c */
> > > > void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
> > > >
> > > > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > > > +{
> > > > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > +
> > > > + mmap_assert_locked(vma->vm_mm);
>
> You must hold the mmap write lock when unmapping. Would be better to
> assert mmap_assert_write_locked() or even vma_assert_write_locked(),
> which implies mmap_assert_write_locked().
I'm not sure why we don't assert this in those paths.
I think I assumed we could only assert readonly because one of those paths
downgrades the mmap write lock to a read lock.
I don't think we can do a VMA write lock assert here, since at the point of
do_munmap() all callers can't possibly have the VMA write lock, since they are
_looking up_ the VMA at the specified address.
But I can convert this to an mmap_assert_write_locked()!
>
> > > > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > > > +}
> > > > +
> > > > #ifdef CONFIG_MMU
> > > >
> > > > static inline void get_anon_vma(struct anon_vma *anon_vma)
> > > > diff --git a/mm/util.c b/mm/util.c
> > > > index dba1191725b6..2b0ed54008d6 100644
> > > > --- a/mm/util.c
> > > > +++ b/mm/util.c
> > > > @@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
> > > > EXPORT_SYMBOL(flush_dcache_folio);
> > > > #endif
> > > >
> > > > +static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > > +{
> > > > + struct vm_area_desc desc = {
> > > > + .mm = vma->vm_mm,
> > > > + .file = file,
> > > > + .start = vma->vm_start,
> > > > + .end = vma->vm_end,
> > > > +
> > > > + .pgoff = vma->vm_pgoff,
> > > > + .vm_file = vma->vm_file,
> > > > + .vma_flags = vma->flags,
> > > > + .page_prot = vma->vm_page_prot,
> > > > +
> > > > + .action.type = MMAP_NOTHING, /* Default */
> > > > + };
> > > > + int err;
> > > > +
> > > > + err = vfs_mmap_prepare(file, &desc);
> > > > + if (err)
> > > > + return err;
> > > > +
> > > > + err = mmap_action_prepare(&desc, &desc.action);
> > > > + if (err)
> > > > + return err;
> > > > +
> > > > + set_vma_from_desc(vma, &desc);
> > > > + return mmap_action_complete(vma, &desc.action);
> > > > +}
> > > > +
> > > > +static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
> > > > +{
> > > > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > > > + void *vm_private_data = vma->vm_private_data;
> > > > + int err;
> > > > +
> > > > + if (!vm_ops->mapped)
> > > > + return 0;
> > > > +
> > >
> > > Hello!
> > >
> > > Can vm_ops be NULL here? __compat_vma_mapped() is called from
> > > compat_vma_mmap(), which is reached when a filesystem provides
> > > mmap_prepare. If the mmap_prepare hook does not set desc->vm_ops,
> > > vma->vm_ops will be NULL and this dereferences a NULL pointer.
> >
> > I _think_ for this to ever be invoked, you would need to be dealing with a
> > file-backed VMA so vm_ops->fault would HAVE to be defined.
> >
> > But you're right anyway as a matter of principle we should check it! Will fix.
> >
> > >
> > > For e.g. drivers/char/mem.c, mmap_zero_prepare() would trigger
> > > a NULL pointer dereference here.
> > >
> > > Would need to do
> > > if (!vm_ops || !vm_ops->mapped)
> > > return 0;
> > >
> > > here
> >
> > Yes.
> >
> > >
> > >
> > > > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
> > > > + &vm_private_data);
> > > > + if (err)
> > > > + unmap_vma_locked(vma);
> > >
> > > when mapped() returns an error, unmap_vma_locked(vma) is called
> > > but execution continues into the vm_private_data update below. After
> > > unmap_vma_locked() the VMA may be freed (do_munmap can remove the VMA
> > > entirely), so accessing vma->vm_private_data after that is a
> > > use-after-free.
> >
> > Very good point :) will fix thanks!
> >
> > Probably:
> >
> > if (err)
> > unmap_vma_locked(vma);
> > else if (vm_private_data != vma->vm_private_data)
> > vma->vm_private_data = vm_private_data;
> >
> > return err;
> >
> > Would be fine.
> >
> > >
> > > Probably need to do:
> > > if (err) {
> > > unmap_vma_locked(vma);
> > > return err;
> > > }
> > >
> > > > + /* Update private data if changed. */
> > > > + if (vm_private_data != vma->vm_private_data)
> > > > + vma->vm_private_data = vm_private_data;
> > > > +
> > > > + return err;
> > > > +}
> > > > +
> > > > /**
> > > > * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> > > > * existing VMA and execute any requested actions.
> > > > @@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
> > > > */
> > > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > > {
> > > > - struct vm_area_desc desc = {
> > > > - .mm = vma->vm_mm,
> > > > - .file = file,
> > > > - .start = vma->vm_start,
> > > > - .end = vma->vm_end,
> > > > -
> > > > - .pgoff = vma->vm_pgoff,
> > > > - .vm_file = vma->vm_file,
> > > > - .vma_flags = vma->flags,
> > > > - .page_prot = vma->vm_page_prot,
> > > > -
> > > > - .action.type = MMAP_NOTHING, /* Default */
> > > > - };
> > > > int err;
> > > >
> > > > - err = vfs_mmap_prepare(file, &desc);
> > > > - if (err)
> > > > - return err;
> > > > -
> > > > - err = mmap_action_prepare(&desc, &desc.action);
> > > > + err = __compat_vma_mmap(file, vma);
> > > > if (err)
> > > > return err;
> > > >
> > > > - set_vma_from_desc(vma, &desc);
> > > > - return mmap_action_complete(vma, &desc.action);
> > > > + return __compat_vma_mapped(file, vma);
> > > > }
> > > > EXPORT_SYMBOL(compat_vma_mmap);
> > > >
> > > > +int __vma_check_mmap_hook(struct vm_area_struct *vma)
> > > > +{
> > > > + /* vm_ops->mapped is not valid if mmap() is specified. */
> > > > + if (WARN_ON_ONCE(vma->vm_ops->mapped))
> > > > + return -EINVAL;
> > >
> > > I think vma->vm_ops can be NULL here. Should be:
> > >
> > > if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped))
> > > return -EINVAL;
> >
> > I think again you'd probably only invoke this on file-backed so be ok, but again
> > as a matter of principle we should check it so will fix, thanks!
> >
> > >
> > > > +
> > > > + return 0;
> > > > +}
> > > > +EXPORT_SYMBOL(__vma_check_mmap_hook);
>
> nit: Any reason __vma_check_mmap_hook() is not inlined next to its
> user vfs_mmap()?
Headers fun, fs.h is a 'before mm.h' header, so vm_operations_struct is not
declared yet here, so we can't actually do the check there.
>
> > > > +
> > > > static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
> > > > const struct page *page)
> > > > {
> > > > @@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
> > > > * invoked if we do NOT merge, so we only clean up the VMA we created.
> > > > */
> > > > if (err) {
> > > > - const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > -
> > > > - do_munmap(current->mm, vma->vm_start, len, NULL);
> > > > -
> > > > + unmap_vma_locked(vma);
> > > > if (action->error_hook) {
> > > > /* We may want to filter the error. */
> > > > err = action->error_hook(err);
> > > > diff --git a/mm/vma.c b/mm/vma.c
> > > > index 054cf1d262fb..ef9f5a5365d1 100644
> > > > --- a/mm/vma.c
> > > > +++ b/mm/vma.c
> > > > @@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> > > > return false;
> > > > }
> > > >
> > > > -static int call_action_complete(struct mmap_state *map,
> > > > - struct mmap_action *action,
> > > > - struct vm_area_struct *vma)
> > > > +static int call_mapped_hook(struct vm_area_struct *vma)
> > > > {
> > > > - int ret;
> > > > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > > > + void *vm_private_data = vma->vm_private_data;
> > > > + int err;
> > > >
> > > > - ret = mmap_action_complete(vma, action);
> > > > + if (!vm_ops || !vm_ops->mapped)
> > > > + return 0;
> > > > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
> > > > + vma->vm_file, &vm_private_data);
> > > > + if (err) {
> > > > + unmap_vma_locked(vma);
> > > > + return err;
> > > > + }
> > > > + /* Update private data if changed. */
> > > > + if (vm_private_data != vma->vm_private_data)
> > > > + vma->vm_private_data = vm_private_data;
> > > > + return 0;
> > > > +}
> > > >
> > > > - /* If we held the file rmap we need to release it. */
> > > > - if (map->hold_file_rmap_lock) {
> > > > - struct file *file = vma->vm_file;
> > > > +static void maybe_drop_file_rmap_lock(struct mmap_state *map,
> > > > + struct vm_area_struct *vma)
> > > > +{
> > > > + struct file *file;
> > > >
> > > > - i_mmap_unlock_write(file->f_mapping);
> > > > - }
> > > > - return ret;
> > > > + if (!map->hold_file_rmap_lock)
> > > > + return;
> > > > + file = vma->vm_file;
> > > > + i_mmap_unlock_write(file->f_mapping);
> > > > }
> > > >
> > > > static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > > @@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > > __mmap_complete(&map, vma);
> > > >
> > > > if (have_mmap_prepare && allocated_new) {
> > > > - error = call_action_complete(&map, &desc.action, vma);
> > > > + error = mmap_action_complete(vma, &desc.action);
> > > > + if (!error)
> > > > + error = call_mapped_hook(vma);
> > > >
> > > > + maybe_drop_file_rmap_lock(&map, vma);
> > > > if (error)
> > > > return error;
> > > > }
> > > > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > > > index 908beb263307..47d8db809f31 100644
> > > > --- a/tools/testing/vma/include/dup.h
> > > > +++ b/tools/testing/vma/include/dup.h
> > > > @@ -606,12 +606,34 @@ struct vm_area_struct {
> > > > } __randomize_layout;
> > > >
> > > > struct vm_operations_struct {
> > > > - void (*open)(struct vm_area_struct * area);
> > > > + /**
> > > > + * @open: Called when a VMA is remapped or split. Not called upon first
> > > > + * mapping a VMA.
> > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > + */
>
> This comment should have been introduced in the previous patch.
It's the testing code, it's not really important. But if I respin I'll fix... :)
>
> > > > + void (*open)(struct vm_area_struct *vma);
> > > > /**
> > > > * @close: Called when the VMA is being removed from the MM.
> > > > * Context: User context. May sleep. Caller holds mmap_lock.
> > > > */
> > > > - void (*close)(struct vm_area_struct * area);
> > > > + void (*close)(struct vm_area_struct *vma);
> > > > + /**
> > > > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > > > + * the new VMA is merged with an adjacent VMA.
> > > > + *
> > > > + * The @vm_private_data field is an output field allowing the user to
> > > > + * modify vma->vm_private_data as necessary.
> > > > + *
> > > > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > > > + * set from f_op->mmap.
> > > > + *
> > > > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > > > + * be unmapped.
> > > > + *
> > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > + */
> > > > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > + const struct file *file, void **vm_private_data);
> > > > /* Called any time before splitting to check if it's allowed */
> > > > int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> > > > int (*mremap)(struct vm_area_struct *area);
> > > > @@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
> > > > swap(vma->vm_file, file);
> > > > fput(file);
> > > > }
> > > > +
> > > > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > > > +{
> > > > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > +
> > > > + mmap_assert_locked(vma->vm_mm);
> > > > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > > > +}
> > > > --
> > > > 2.53.0
> > > >
> > > >
> >
> > Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
2026-03-16 2:32 ` Suren Baghdasaryan
@ 2026-03-16 14:29 ` Lorenzo Stoakes (Oracle)
2026-03-17 3:41 ` Suren Baghdasaryan
0 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-16 14:29 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Sun, Mar 15, 2026 at 07:32:54PM -0700, Suren Baghdasaryan wrote:
> On Fri, Mar 13, 2026 at 5:00 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> > > On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > >
> > > > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > > > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > > > the deprecated mmap callback.
> > > >
> > > > However, it did not account for the fact that mmap_prepare can fail to map
> > > > due to an out of memory error, and thus should not be incrementing a
> > > > reference count on mmap_prepare.
>
> This is a bit confusing. I see the current implementation does
> afs_add_open_mmap() and then if generic_file_mmap_prepare() fails it
> does afs_drop_open_mmap(), therefore refcounting seems to be balanced.
> Is there really a problem?
Firstly, mmap_prepare is invoked before we try to merge, so the VMA could in
theory get merged and then the refcounting will be wrong.
Secondly, mmap_prepare occurs at such at time where it is _possible_ that
allocation failures as described below could happen.
I'll update the commit message to reflect the merge aspect actually.
>
> > > >
> > > > With the newly added vm_ops->mapped callback available, we can simply defer
> > > > this operation to that callback which is only invoked once the mapping is
> > > > successfully in place (but not yet visible to userspace as the mmap and VMA
> > > > write locks are held).
> > > >
> > > > Therefore add afs_mapped() to implement this callback for AFS.
> > > >
> > > > In practice the mapping allocations are 'too small to fail' so this is
> > > > something that realistically should never happen in practice (or would do
> > > > so in a case where the process is about to die anyway), but we should still
> > > > handle this.
>
> nit: I would drop the above paragraph. If it's impossible why are you
> handling it? If it's unlikely, then handling it is even more
> important.
Sure I can drop it, but it's an ongoing thing with these small allocations.
I wish we could just move to a scenario where we can simpy assume allocations
will always succeed :)
Vlasta - thoughts?
Cheers, Lorenzo
>
> > > >
> > > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > > ---
> > > > fs/afs/file.c | 20 ++++++++++++++++----
> > > > 1 file changed, 16 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > > > index f609366fd2ac..69ef86f5e274 100644
> > > > --- a/fs/afs/file.c
> > > > +++ b/fs/afs/file.c
> > > > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> > > > static void afs_vm_open(struct vm_area_struct *area);
> > > > static void afs_vm_close(struct vm_area_struct *area);
> > > > static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > + const struct file *file, void **vm_private_data);
> > > >
> > > > const struct file_operations afs_file_operations = {
> > > > .open = afs_open,
> > > > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> > > > };
> > > >
> > > > static const struct vm_operations_struct afs_vm_ops = {
> > > > + .mapped = afs_mapped,
> > > > .open = afs_vm_open,
> > > > .close = afs_vm_close,
> > > > .fault = filemap_fault,
> > > > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> > > > afs_add_open_mmap(vnode);
> > >
> > > Is the above afs_add_open_mmap an additional one, which could cause a reference
> > > leak? Does the above one need to be removed and only the one in afs_mapped()
> > > needs to be kept?
> >
> > Ah yeah good spot, will fix thanks!
> >
> > >
> > > >
> > > > ret = generic_file_mmap_prepare(desc);
> > > > - if (ret == 0)
> > > > - desc->vm_ops = &afs_vm_ops;
> > > > - else
> > > > - afs_drop_open_mmap(vnode);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + desc->vm_ops = &afs_vm_ops;
> > > > return ret;
> > > > }
> > > >
> > > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > + const struct file *file, void **vm_private_data)
> > > > +{
> > > > + struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > > > +
> > > > + afs_add_open_mmap(vnode);
> > > > + return 0;
> > > > +}
> > > > +
> > > > static void afs_vm_open(struct vm_area_struct *vma)
> > > > {
> > > > afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > > > --
> > > > 2.53.0
> > > >
> > > >
> >
> > Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 03/15] mm: document vm_operations_struct->open the same as close()
2026-03-16 0:43 ` Suren Baghdasaryan
@ 2026-03-16 14:31 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-16 14:31 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Sun, Mar 15, 2026 at 05:43:41PM -0700, Suren Baghdasaryan wrote:
> On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > Describe when the operation is invoked and the context in which it is
> > invoked, matching the description already added for vm_op->close().
> >
> > While we're here, update all outdated references to an 'area' field for
> > VMAs to the more consistent 'vma'.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> > include/linux/mm.h | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index cc5960a84382..12a0b4c63736 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -748,15 +748,20 @@ struct vm_uffd_ops;
> > * to the functions called when a no-page or a wp-page exception occurs.
> > */
> > struct vm_operations_struct {
> > - void (*open)(struct vm_area_struct * area);
> > + /**
> > + * @open: Called when a VMA is remapped or split. Not called upon first
> > + * mapping a VMA.
>
> It's also called from dup_mmap() which is part of forking.
Ah yup :) will update thanks!
>
> > + * Context: User context. May sleep. Caller holds mmap_lock.
> > + */
> > + void (*open)(struct vm_area_struct *vma);
> > /**
> > * @close: Called when the VMA is being removed from the MM.
> > * Context: User context. May sleep. Caller holds mmap_lock.
> > */
> > - void (*close)(struct vm_area_struct * area);
> > + void (*close)(struct vm_area_struct *vma);
> > /* Called any time before splitting to check if it's allowed */
> > - int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> > - int (*mremap)(struct vm_area_struct *area);
> > + int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> > + int (*mremap)(struct vm_area_struct *vma);
> > /*
> > * Called by mprotect() to make driver-specific permission
> > * checks before mprotect() is finalised. The VMA must not
> > @@ -768,7 +773,7 @@ struct vm_operations_struct {
> > vm_fault_t (*huge_fault)(struct vm_fault *vmf, unsigned int order);
> > vm_fault_t (*map_pages)(struct vm_fault *vmf,
> > pgoff_t start_pgoff, pgoff_t end_pgoff);
> > - unsigned long (*pagesize)(struct vm_area_struct * area);
> > + unsigned long (*pagesize)(struct vm_area_struct *vma);
> >
> > /* notification that a previously read-only page is about to become
> > * writable, if an error is returned it will cause a SIGBUS */
> > --
> > 2.53.0
> >
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
2026-03-15 22:56 ` Suren Baghdasaryan
2026-03-15 23:06 ` Suren Baghdasaryan
@ 2026-03-16 14:44 ` Lorenzo Stoakes (Oracle)
2026-03-16 21:27 ` Suren Baghdasaryan
1 sibling, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-16 14:44 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Sun, Mar 15, 2026 at 03:56:54PM -0700, Suren Baghdasaryan wrote:
> On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > Rather than passing arbitrary fields, pass an mmap_action field directly to
> > mmap prepare and complete helpers to put all the action-specific logic in
> > the function actually doing the work.
> >
> > Additionally, allow mmap prepare functions to return an error so we can
> > error out as soon as possible if there is something logically incorrect in
> > the input.
> >
> > Update remap_pfn_range_prepare() to properly check the input range for the
> > CoW case.
>
> By "properly check" do you mean the replacement of desc->start and
> desc->end with action->remap.start and action->remap.start +
> action->remap.size when calling get_remap_pgoff() from
> remap_pfn_range_prepare()?
>
> >
> > While we're here, make remap_pfn_range_prepare_vma() a little neater, and
> > pass mmap_action directly to call_action_complete().
> >
> > Then, update compat_vma_mmap() to perform its logic directly, as
> > __compat_vma_map() is not used by anything so we don't need to export it.
>
> Not directly related to this patch but while reviewing, I was also
> checking vma locking rules in this mmap_prepare() + mmap() sequence
> and I noticed that the new VMA flag modification functions like
> vma_set_flags_mask() do assert vma_assert_locked(vma). It would be
Do NOT? :)
I don't think it'd work, because in some cases you're setting flags for a
VMA that is not yet inserted in the tree, etc.
I don't think it's hugely useful to split out these functions in some way
in the way the vm_flags_*() stuff is split so we assert sometimes, not
others.
I'd rather keep this as clean an interface as possible.
In any case the majority of cases where flags are being set are not on the
VMA, so really only core code, that would likely otherwise assert when it
needs to, would already be asserting.
The cases where drivers will do it, all of them will be using
vma_desc_set_flags() etc.
> useful to add these but as a separate change. I will add it to my todo
> list.
So I don't think it'd be generally useful at this time.
>
> >
> > Also update compat_vma_mmap() to use vfs_mmap_prepare() rather than calling
> > the mmap_prepare op directly.
> >
> > Finally, update the VMA userland tests to reflect the changes.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> > include/linux/fs.h | 2 -
> > include/linux/mm.h | 8 +--
> > mm/internal.h | 28 +++++---
> > mm/memory.c | 45 +++++++-----
> > mm/util.c | 112 +++++++++++++-----------------
> > mm/vma.c | 21 +++---
> > tools/testing/vma/include/dup.h | 9 ++-
> > tools/testing/vma/include/stubs.h | 9 +--
> > 8 files changed, 123 insertions(+), 111 deletions(-)
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 8b3dd145b25e..a2628a12bd2b 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2058,8 +2058,6 @@ static inline bool can_mmap_file(struct file *file)
> > return true;
> > }
> >
> > -int __compat_vma_mmap(const struct file_operations *f_op,
> > - struct file *file, struct vm_area_struct *vma);
> > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> >
> > static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 4c4fd55fc823..cc5960a84382 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4116,10 +4116,10 @@ static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
> > mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
> > }
> >
> > -void mmap_action_prepare(struct mmap_action *action,
> > - struct vm_area_desc *desc);
> > -int mmap_action_complete(struct mmap_action *action,
> > - struct vm_area_struct *vma);
> > +int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action);
> > +int mmap_action_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action);
> >
> > /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
> > static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 95b583e7e4f7..7bfa85b5e78b 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1775,26 +1775,32 @@ int walk_page_range_debug(struct mm_struct *mm, unsigned long start,
> > void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm);
> > int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm);
> >
> > -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
> > -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> > - unsigned long pfn, unsigned long size, pgprot_t pgprot);
> > +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action);
> > +int remap_pfn_range_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action);
> >
> > -static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> > - unsigned long orig_pfn, unsigned long size)
> > +static inline int io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > {
> > + const unsigned long orig_pfn = action->remap.start_pfn;
> > + const unsigned long size = action->remap.size;
> > const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> >
> > - return remap_pfn_range_prepare(desc, pfn);
> > + action->remap.start_pfn = pfn;
> > + return remap_pfn_range_prepare(desc, action);
> > }
> >
> > static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
> > - unsigned long addr, unsigned long orig_pfn, unsigned long size,
> > - pgprot_t orig_prot)
> > + struct mmap_action *action)
> > {
> > - const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > - const pgprot_t prot = pgprot_decrypted(orig_prot);
> > + const unsigned long size = action->remap.size;
> > + const unsigned long orig_pfn = action->remap.start_pfn;
> > + const pgprot_t orig_prot = vma->vm_page_prot;
> >
> > - return remap_pfn_range_complete(vma, addr, pfn, size, prot);
> > + action->remap.pgprot = pgprot_decrypted(orig_prot);
> > + action->remap.start_pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > + return remap_pfn_range_complete(vma, action);
> > }
> >
> > #ifdef CONFIG_MMU_NOTIFIER
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 6aa0ea4af1fc..364fa8a45360 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3099,26 +3099,34 @@ static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> > }
> > #endif
> >
> > -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
> > +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > {
> > - /*
> > - * We set addr=VMA start, end=VMA end here, so this won't fail, but we
> > - * check it again on complete and will fail there if specified addr is
> > - * invalid.
> > - */
> > - get_remap_pgoff(vma_desc_is_cow_mapping(desc), desc->start, desc->end,
> > - desc->start, desc->end, pfn, &desc->pgoff);
> > + const unsigned long start = action->remap.start;
> > + const unsigned long end = start + action->remap.size;
> > + const unsigned long pfn = action->remap.start_pfn;
> > + const bool is_cow = vma_desc_is_cow_mapping(desc);
>
> I was trying to figure out who sets action->remap.start and
> action->remap.size and if they somehow guaranteed to be always equal
> to desc->start and (desc->end - desc->start). My understanding is that
> action->remap.start and action->remap.size are set by
> f_op->mmap_prepare() but I'm not sure if they are always the same as
> desc->start and (desc->end - desc->start) and if so, how do we enforce
> that.
They are set, and they might not always be the same, because the existing
implementation does not set them the same.
Once I've completed the change, I can check to ensure that nobody is doing
anything crazy with this.
I also plan to add specific discontiguous range handlers to handle the
cases where drivers wish to map that way.
In fact, I already implemented it (and DMA coherent stuff) but stripped it
out the series for now for time (the original series was ~27 patches :) as
I want to test that more etc.
Users have access to mmap_action_remap_full() to specify that they want to
remap the full range.
>
> > + int err;
> > +
> > + err = get_remap_pgoff(is_cow, start, end, desc->start, desc->end, pfn,
> > + &desc->pgoff);
> > + if (err)
> > + return err;
> > +
> > vma_desc_set_flags_mask(desc, VMA_REMAP_FLAGS);
> > + return 0;
> > }
> >
> > -static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma, unsigned long addr,
> > - unsigned long pfn, unsigned long size)
> > +static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma,
> > + unsigned long addr, unsigned long pfn,
> > + unsigned long size)
> > {
> > - unsigned long end = addr + PAGE_ALIGN(size);
> > + const unsigned long end = addr + PAGE_ALIGN(size);
> > + const bool is_cow = is_cow_mapping(vma->vm_flags);
> > int err;
> >
> > - err = get_remap_pgoff(is_cow_mapping(vma->vm_flags), addr, end,
> > - vma->vm_start, vma->vm_end, pfn, &vma->vm_pgoff);
> > + err = get_remap_pgoff(is_cow, addr, end, vma->vm_start, vma->vm_end,
> > + pfn, &vma->vm_pgoff);
> > if (err)
> > return err;
> >
> > @@ -3151,10 +3159,15 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> > }
> > EXPORT_SYMBOL(remap_pfn_range);
> >
> > -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> > - unsigned long pfn, unsigned long size, pgprot_t prot)
> > +int remap_pfn_range_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action)
> > {
> > - return do_remap_pfn_range(vma, addr, pfn, size, prot);
> > + const unsigned long start = action->remap.start;
> > + const unsigned long pfn = action->remap.start_pfn;
> > + const unsigned long size = action->remap.size;
> > + const pgprot_t prot = action->remap.pgprot;
> > +
> > + return do_remap_pfn_range(vma, start, pfn, size, prot);
> > }
> >
> > /**
> > diff --git a/mm/util.c b/mm/util.c
> > index ce7ae80047cf..dba1191725b6 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -1163,43 +1163,6 @@ void flush_dcache_folio(struct folio *folio)
> > EXPORT_SYMBOL(flush_dcache_folio);
> > #endif
> >
> > -/**
> > - * __compat_vma_mmap() - See description for compat_vma_mmap()
> > - * for details. This is the same operation, only with a specific file operations
> > - * struct which may or may not be the same as vma->vm_file->f_op.
> > - * @f_op: The file operations whose .mmap_prepare() hook is specified.
> > - * @file: The file which backs or will back the mapping.
> > - * @vma: The VMA to apply the .mmap_prepare() hook to.
> > - * Returns: 0 on success or error.
> > - */
> > -int __compat_vma_mmap(const struct file_operations *f_op,
> > - struct file *file, struct vm_area_struct *vma)
> > -{
> > - struct vm_area_desc desc = {
> > - .mm = vma->vm_mm,
> > - .file = file,
> > - .start = vma->vm_start,
> > - .end = vma->vm_end,
> > -
> > - .pgoff = vma->vm_pgoff,
> > - .vm_file = vma->vm_file,
> > - .vma_flags = vma->flags,
> > - .page_prot = vma->vm_page_prot,
> > -
> > - .action.type = MMAP_NOTHING, /* Default */
> > - };
> > - int err;
> > -
> > - err = f_op->mmap_prepare(&desc);
> > - if (err)
> > - return err;
> > -
> > - mmap_action_prepare(&desc.action, &desc);
> > - set_vma_from_desc(vma, &desc);
> > - return mmap_action_complete(&desc.action, vma);
> > -}
> > -EXPORT_SYMBOL(__compat_vma_mmap);
> > -
> > /**
> > * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> > * existing VMA and execute any requested actions.
> > @@ -1228,7 +1191,31 @@ EXPORT_SYMBOL(__compat_vma_mmap);
> > */
> > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > {
> > - return __compat_vma_mmap(file->f_op, file, vma);
> > + struct vm_area_desc desc = {
> > + .mm = vma->vm_mm,
> > + .file = file,
> > + .start = vma->vm_start,
> > + .end = vma->vm_end,
> > +
> > + .pgoff = vma->vm_pgoff,
> > + .vm_file = vma->vm_file,
> > + .vma_flags = vma->flags,
> > + .page_prot = vma->vm_page_prot,
> > +
> > + .action.type = MMAP_NOTHING, /* Default */
> > + };
> > + int err;
> > +
> > + err = vfs_mmap_prepare(file, &desc);
> > + if (err)
> > + return err;
> > +
> > + err = mmap_action_prepare(&desc, &desc.action);
> > + if (err)
> > + return err;
> > +
> > + set_vma_from_desc(vma, &desc);
> > + return mmap_action_complete(vma, &desc.action);
> > }
> > EXPORT_SYMBOL(compat_vma_mmap);
> >
> > @@ -1320,8 +1307,8 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
> > }
> > }
> >
> > -static int mmap_action_finish(struct mmap_action *action,
> > - const struct vm_area_struct *vma, int err)
> > +static int mmap_action_finish(struct vm_area_struct *vma,
> > + struct mmap_action *action, int err)
> > {
> > /*
> > * If an error occurs, unmap the VMA altogether and return an error. We
> > @@ -1355,35 +1342,36 @@ static int mmap_action_finish(struct mmap_action *action,
> > * action which need to be performed.
> > * @desc: The VMA descriptor to prepare for @action.
> > * @action: The action to perform.
> > + *
> > + * Returns: 0 on success, otherwise error.
> > */
> > -void mmap_action_prepare(struct mmap_action *action,
> > - struct vm_area_desc *desc)
> > +int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
>
> Any reason you are swapping the arguments?
For consistency with other functions to be added.
> It also looks like we always call mmap_action_prepare() with action ==
> desc->action, like this: mmap_action_prepare(&desc.action, &desc). Why
> don't we eliminate the action parameter altogether and use desc.action
> from inside the function?
I think in previous iterations I thought about overriding one action with
another and wanted to keep that flexibility, but then have never done that
in practice.
So probably I can just drop that yes, will try it on respin.
>
> > +
>
> extra new line.
Ack will fix
>
> > {
> > switch (action->type) {
> > case MMAP_NOTHING:
> > - break;
> > + return 0;
> > case MMAP_REMAP_PFN:
> > - remap_pfn_range_prepare(desc, action->remap.start_pfn);
> > - break;
> > + return remap_pfn_range_prepare(desc, action);
> > case MMAP_IO_REMAP_PFN:
> > - io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
> > - action->remap.size);
> > - break;
> > + return io_remap_pfn_range_prepare(desc, action);
> > }
> > }
> > EXPORT_SYMBOL(mmap_action_prepare);
> >
> > /**
> > * mmap_action_complete - Execute VMA descriptor action.
> > - * @action: The action to perform.
> > * @vma: The VMA to perform the action upon.
> > + * @action: The action to perform.
> > *
> > * Similar to mmap_action_prepare().
> > *
> > * Return: 0 on success, or error, at which point the VMA will be unmapped.
> > */
> > -int mmap_action_complete(struct mmap_action *action,
> > - struct vm_area_struct *vma)
> > +int mmap_action_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action)
> > +
> > {
> > int err = 0;
> >
> > @@ -1391,23 +1379,19 @@ int mmap_action_complete(struct mmap_action *action,
> > case MMAP_NOTHING:
> > break;
> > case MMAP_REMAP_PFN:
> > - err = remap_pfn_range_complete(vma, action->remap.start,
> > - action->remap.start_pfn, action->remap.size,
> > - action->remap.pgprot);
> > + err = remap_pfn_range_complete(vma, action);
> > break;
> > case MMAP_IO_REMAP_PFN:
> > - err = io_remap_pfn_range_complete(vma, action->remap.start,
> > - action->remap.start_pfn, action->remap.size,
> > - action->remap.pgprot);
> > + err = io_remap_pfn_range_complete(vma, action);
> > break;
> > }
> >
> > - return mmap_action_finish(action, vma, err);
> > + return mmap_action_finish(vma, action, err);
> > }
> > EXPORT_SYMBOL(mmap_action_complete);
> > #else
> > -void mmap_action_prepare(struct mmap_action *action,
> > - struct vm_area_desc *desc)
> > +int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > {
> > switch (action->type) {
> > case MMAP_NOTHING:
> > @@ -1417,11 +1401,13 @@ void mmap_action_prepare(struct mmap_action *action,
> > WARN_ON_ONCE(1); /* nommu cannot handle these. */
> > break;
> > }
> > +
> > + return 0;
> > }
> > EXPORT_SYMBOL(mmap_action_prepare);
> >
> > -int mmap_action_complete(struct mmap_action *action,
> > - struct vm_area_struct *vma)
> > +int mmap_action_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action)
> > {
> > int err = 0;
> >
> > @@ -1436,7 +1422,7 @@ int mmap_action_complete(struct mmap_action *action,
> > break;
> > }
> >
> > - return mmap_action_finish(action, vma, err);
> > + return mmap_action_finish(vma, action, err);
> > }
> > EXPORT_SYMBOL(mmap_action_complete);
> > #endif
> > diff --git a/mm/vma.c b/mm/vma.c
> > index be64f781a3aa..054cf1d262fb 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -2613,15 +2613,19 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
> > vma_set_page_prot(vma);
> > }
> >
> > -static void call_action_prepare(struct mmap_state *map,
> > - struct vm_area_desc *desc)
> > +static int call_action_prepare(struct mmap_state *map,
> > + struct vm_area_desc *desc)
> > {
> > struct mmap_action *action = &desc->action;
> > + int err;
> >
> > - mmap_action_prepare(action, desc);
> > + err = mmap_action_prepare(desc, action);
> > + if (err)
> > + return err;
> >
> > if (action->hide_from_rmap_until_complete)
> > map->hold_file_rmap_lock = true;
> > + return 0;
> > }
> >
> > /*
> > @@ -2645,7 +2649,9 @@ static int call_mmap_prepare(struct mmap_state *map,
> > if (err)
> > return err;
> >
> > - call_action_prepare(map, desc);
> > + err = call_action_prepare(map, desc);
> > + if (err)
> > + return err;
> >
> > /* Update fields permitted to be changed. */
> > map->pgoff = desc->pgoff;
> > @@ -2700,13 +2706,12 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> > }
> >
> > static int call_action_complete(struct mmap_state *map,
> > - struct vm_area_desc *desc,
> > + struct mmap_action *action,
> > struct vm_area_struct *vma)
> > {
> > - struct mmap_action *action = &desc->action;
> > int ret;
> >
> > - ret = mmap_action_complete(action, vma);
> > + ret = mmap_action_complete(vma, action);
> >
> > /* If we held the file rmap we need to release it. */
> > if (map->hold_file_rmap_lock) {
> > @@ -2768,7 +2773,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > __mmap_complete(&map, vma);
> >
> > if (have_mmap_prepare && allocated_new) {
> > - error = call_action_complete(&map, &desc, vma);
> > + error = call_action_complete(&map, &desc.action, vma);
> >
> > if (error)
> > return error;
> > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > index 5eb313beb43d..908beb263307 100644
> > --- a/tools/testing/vma/include/dup.h
> > +++ b/tools/testing/vma/include/dup.h
> > @@ -1106,7 +1106,7 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
> >
> > .pgoff = vma->vm_pgoff,
> > .vm_file = vma->vm_file,
> > - .vm_flags = vma->vm_flags,
> > + .vma_flags = vma->flags,
> > .page_prot = vma->vm_page_prot,
> >
> > .action.type = MMAP_NOTHING, /* Default */
> > @@ -1117,9 +1117,12 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
> > if (err)
> > return err;
> >
> > - mmap_action_prepare(&desc.action, &desc);
> > + err = mmap_action_prepare(&desc, &desc.action);
> > + if (err)
> > + return err;
> > +
> > set_vma_from_desc(vma, &desc);
> > - return mmap_action_complete(&desc.action, vma);
> > + return mmap_action_complete(vma, &desc.action);
> > }
> >
> > static inline int compat_vma_mmap(struct file *file,
> > diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/stubs.h
> > index 947a3a0c2566..76c4b668bc62 100644
> > --- a/tools/testing/vma/include/stubs.h
> > +++ b/tools/testing/vma/include/stubs.h
> > @@ -81,13 +81,14 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
> > {
> > }
> >
> > -static inline void mmap_action_prepare(struct mmap_action *action,
> > - struct vm_area_desc *desc)
> > +static inline int mmap_action_prepare(struct vm_area_desc *desc,
> > + struct mmap_action *action)
> > {
> > + return 0;
> > }
> >
> > -static inline int mmap_action_complete(struct mmap_action *action,
> > - struct vm_area_struct *vma)
> > +static inline int mmap_action_complete(struct vm_area_struct *vma,
> > + struct mmap_action *action)
> > {
> > return 0;
> > }
> > --
> > 2.53.0
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
2026-03-15 23:06 ` Suren Baghdasaryan
@ 2026-03-16 14:47 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-16 14:47 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Sun, Mar 15, 2026 at 04:06:48PM -0700, Suren Baghdasaryan wrote:
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -4116,10 +4116,10 @@ static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
> > > mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
> > > }
> > >
> > > -void mmap_action_prepare(struct mmap_action *action,
> > > - struct vm_area_desc *desc);
> > > -int mmap_action_complete(struct mmap_action *action,
> > > - struct vm_area_struct *vma);
> > > +int mmap_action_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action);
> > > +int mmap_action_complete(struct vm_area_struct *vma,
> > > + struct mmap_action *action);
> > >
> > > /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
> > > static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
> > > diff --git a/mm/internal.h b/mm/internal.h
> > > index 95b583e7e4f7..7bfa85b5e78b 100644
> > > --- a/mm/internal.h
> > > +++ b/mm/internal.h
> > > @@ -1775,26 +1775,32 @@ int walk_page_range_debug(struct mm_struct *mm, unsigned long start,
> > > void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm);
> > > int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm);
> > >
> > > -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
> > > -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> > > - unsigned long pfn, unsigned long size, pgprot_t pgprot);
> > > +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action);
> > > +int remap_pfn_range_complete(struct vm_area_struct *vma,
> > > + struct mmap_action *action);
> > >
> > > -static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> > > - unsigned long orig_pfn, unsigned long size)
> > > +static inline int io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action)
> > > {
> > > + const unsigned long orig_pfn = action->remap.start_pfn;
> > > + const unsigned long size = action->remap.size;
> > > const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > >
> > > - return remap_pfn_range_prepare(desc, pfn);
> > > + action->remap.start_pfn = pfn;
> > > + return remap_pfn_range_prepare(desc, action);
> > > }
> > >
> > > static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
> > > - unsigned long addr, unsigned long orig_pfn, unsigned long size,
> > > - pgprot_t orig_prot)
> > > + struct mmap_action *action)
> > > {
> > > - const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > > - const pgprot_t prot = pgprot_decrypted(orig_prot);
> > > + const unsigned long size = action->remap.size;
> > > + const unsigned long orig_pfn = action->remap.start_pfn;
> > > + const pgprot_t orig_prot = vma->vm_page_prot;
> > >
> > > - return remap_pfn_range_complete(vma, addr, pfn, size, prot);
> > > + action->remap.pgprot = pgprot_decrypted(orig_prot);
>
> I'm guessing it doesn't really matter but after this change
> action->remap.pgprot will store the decrypted value while before this
> change it was kept the way mmap_prepare() originally set it. We pass
> the action structure later to mmap_actpion_finish() but it does not use
> action->remap.pgprot, so this probably doesn't matter.
Yeah it doesn't really matter either way.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback
2026-03-13 0:12 ` Randy Dunlap
@ 2026-03-16 14:51 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-16 14:51 UTC (permalink / raw)
To: Randy Dunlap
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, Mar 12, 2026 at 05:12:04PM -0700, Randy Dunlap wrote:
> (Andrew: patch attached)
>
>
> On 3/12/26 1:27 PM, Lorenzo Stoakes (Oracle) wrote:
>
> Documentation/filesystems/mmap_prepare.rst: WARNING: document isn't included in any toctree [toc.not_included]
>
> Should be in some index.rst file. In filesystems I suppose.
Ack thanks.
>
> > ---
> > Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
> > 1 file changed, 131 insertions(+)
> > create mode 100644 Documentation/filesystems/mmap_prepare.rst
> >
> > diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
> > new file mode 100644
> > index 000000000000..76908200f3a1
> > --- /dev/null
> > +++ b/Documentation/filesystems/mmap_prepare.rst
> > @@ -0,0 +1,131 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===========================
> > +mmap_prepare callback HOWTO
> > +===========================
> > +
> > +Introduction
> > +############
>
> Kernel style is "=============" above instead of "############".
Ack
>
> > +
> > +The `struct file->f_op->mmap()` callback has been deprecated as it is both a
> > +stability and security risk, and doesn't always permit the merging of adjacent
> > +mappings resulting in unnecessary memory fragmentation.
> > +
> > +It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
> > +these problems.
> > +
> > +## How To Use
> > +
> > +In your driver's `struct file_operations` struct, specify an `mmap_prepare`
> > +callback rather than an `mmap` one, e.g. for ext4:
> > +
> > +
> > +.. code-block:: C
> > +
> > + const struct file_operations ext4_file_operations = {
> > + ...
> > + .mmap_prepare = ext4_file_mmap_prepare,
> > + };
> > +
> > +This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
> > +
> > +Examining the `struct vm_area_desc` type:
> > +
> > +.. code-block:: C
> > +
> > + struct vm_area_desc {
> > + /* Immutable state. */
> > + const struct mm_struct *const mm;
> > + struct file *const file; /* May vary from vm_file in stacked callers. */
> > + unsigned long start;
> > + unsigned long end;
> > +
> > + /* Mutable fields. Populated with initial state. */
> > + pgoff_t pgoff;
> > + struct file *vm_file;
> > + vma_flags_t vma_flags;
> > + pgprot_t page_prot;
> > +
> > + /* Write-only fields. */
> > + const struct vm_operations_struct *vm_ops;
> > + void *private_data;
> > +
> > + /* Take further action? */
> > + struct mmap_action action;
> > + };
> > +
> > +This is straightforward - you have all the fields you need to set up the
> > +mapping, and you can update the mutable and writable fields, for instance:
> > +
> > +.. code-block:: Cw
>
> .. code-block:: C
>
> Documentation/filesystems/mmap_prepare.rst:60: WARNING: Pygments lexer name 'Cw' is not known [misc.highlighting_failure]
>
> Maybe a typo?
Yeah is a typo thanks!
>
> > +
> > + static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
> > + {
> > + int ret;
> > + struct file *file = desc->file;
> > + struct inode *inode = file->f_mapping->host;
> > +
> > + ...
> > +
> > + file_accessed(file);
> > + if (IS_DAX(file_inode(file))) {
> > + desc->vm_ops = &ext4_dax_vm_ops;
> > + vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
> > + } else {
> > + desc->vm_ops = &ext4_file_vm_ops;
> > + }
> > + return 0;
> > + }
> > +
> > +Importantly, you no longer have to dance around with reference counts or locks
> > +when updating these fields - __you can simply go ahead and change them__.
> > +
> > +Everything is taken care of by the mapping code.
> > +
> > +VMA Flags
> > +=========
>
> and then use "---------------" here instead of "==============".
Ack
>
> (from Documentation/doc-guide/sphinx.rst)
>
> > +
> > +Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
> > +you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
> > +`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
> > +locking done correctly for you, this is no longer necessary.
> > +
> > +Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
> > +etc. - i.e. using a `VM_xxx` macro has changed too.
> > +
> > +When implementing `mmap_prepare()`, reference flags by their bit number, defined
> > +as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
> > +of (where `desc` is a pointer to `struct vma_area_desc`):
> > +
> > +* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
> > + wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
> > + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
> > + otherwise `false`.
> > +* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
> > + additional flags specified by a comma-separated list,
> > + e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
> > +* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
> > + flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
> > + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
> > +
> > +Actions
> > +=======
> > +
> > +You can now very easily have actions be performed upon a mapping once set up by
> > +utilising simple helper functions invoked upon the `struct vm_area_desc`
> > +pointer. These are:
> > +
> > +* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
> > + range starting a virtual address and PFN number of a set size.
> > +
> > +* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
> > + entire mapping from `start_pfn` onward.
> > +
> > +* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
> > + remap.
> > +
> > +* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
> > + the entire mapping from `start_pfn` onward.
> > +
> > +**NOTE:** The 'action' field should never normally be manipulated directly,
> > +rather you ought to use one of these helpers.
>
> I also see this warning, but I don't know what it is referring to:
>
> Documentation/filesystems/mmap_prepare.rst:132: ERROR: Anonymous hyperlink mismatch: 1 references but 0 targets.
> See "backrefs" attribute for IDs. [docutils]
>
> (OK, I found/fixed that also.)
>
> There are also lots of single ` marks which mean italics. I thought those were
> not what was intended, so I changed (most of) them to `` marks, which means
> "code block / monospace". I can fix those if needed.
>
> from the patch file:
> @Lorenzo: ISTR that you prefer explicit quoting on structs and
> functions. I didn't do that here since kernel automarkup does that,
> but if you prefer, I can redo the patch with those changes.
The issue was in another document it didn't seem to properly recognise the types
AFAICT (but I might have been mistaken anyway!) But I'm fine without.
>
> HTH.
> --
> ~Randy
Thanks for this, will fold the patch into the respin also!
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]()
2026-03-12 23:15 ` Randy Dunlap
@ 2026-03-16 14:54 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-16 14:54 UTC (permalink / raw)
To: Randy Dunlap
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Thu, Mar 12, 2026 at 04:15:26PM -0700, Randy Dunlap wrote:
>
> On 3/12/26 1:27 PM, Lorenzo Stoakes (Oracle) wrote:
>
> > Finally, we update the VMA tests accordingly to reflect the changes.
>
> IMO we could omit the word "we" 5 times above.
> (but no change is required)
>
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 88f42faeb377..88ad5649c02d 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
>
> > +/**
> > + * range_is_subset - Is the specified inner range a subset of the outer range?
> > + * @outer_start: The start of the outer range.
> > + * @outer_end: The exclusive end of the outer range.
> > + * @inner_start: The start of the inner range.
> > + * @inner_end: The exclusive end of the inner range.
> > + *
> > + * Returns %true if [inner_start, inner_end) is a subset of [outer_start,
>
> * Returns:
> (for kernel-doc)
Ack
>
> > + * outer_end), otherwise %false.
> > + */
> > +static inline bool range_is_subset(unsigned long outer_start,
> > + unsigned long outer_end,
> > + unsigned long inner_start,
> > + unsigned long inner_end)
> > +{
> > + return outer_start <= inner_start && inner_end <= outer_end;
> > +}
> > +
> > +/**
> > + * range_in_vma - is the specified [@start, @end) range a subset of the VMA?
> > + * @vma: The VMA against which we want to check [@start, @end).
> > + * @start: The start of the range we wish to check.
> > + * @end: The exclusive end of the range we wish to check.
> > + *
> > + * Returns %true if [@start, @end) is a subset of [@vma->vm_start,
>
> * Returns:
Ack
>
> > + * @vma->vm_end), %false otherwise.
> > + */
> > static inline bool range_in_vma(const struct vm_area_struct *vma,
> > unsigned long start, unsigned long end)
> > {
> > - return (vma && vma->vm_start <= start && end <= vma->vm_end);
> > + if (!vma)
> > + return false;
> > +
> > + return range_is_subset(vma->vm_start, vma->vm_end, start, end);
> > +}
> > +
> > +/**
> > + * range_in_vma_desc - is the specified [@start, @end) range a subset of the VMA
> > + * described by @desc, a VMA descriptor?
> > + * @desc: The VMA descriptor against which we want to check [@start, @end).
> > + * @start: The start of the range we wish to check.
> > + * @end: The exclusive end of the range we wish to check.
> > + *
> > + * Returns %true if [@start, @end) is a subset of [@desc->start, @desc->end),
>
> * Returns:
Ack, I think in general I've seen (or believe I've seen :) other cases without
the colon, so was kinda imitating, but I may also be imagining that ;)
>
> > + * %false otherwise.
> > + */
> > +static inline bool range_in_vma_desc(const struct vm_area_desc *desc,
> > + unsigned long start, unsigned long end)
> > +{
> > + if (!desc)
> > + return false;
> > +
> > + return range_is_subset(desc->start, desc->end, start, end);
> > }
>
> --
> ~Randy
>
Will also fold these changes into the respin!
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback
2026-03-15 23:23 ` Suren Baghdasaryan
@ 2026-03-16 19:16 ` Lorenzo Stoakes (Oracle)
2026-03-16 22:59 ` Suren Baghdasaryan
0 siblings, 1 reply; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-16 19:16 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Sun, Mar 15, 2026 at 04:23:14PM -0700, Suren Baghdasaryan wrote:
> On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > This documentation makes it easier for a driver/file system implementer to
> > correctly use this callback.
> >
> > It covers the fundamentals, whilst intentionally leaving the less lovely
> > possible actions one might take undocumented (for instance - the
> > success_hook, error_hook fields in mmap_action).
> >
> > The document also covers the new VMA flags implementation which is the only
> > one which will work correctly with mmap_prepare.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> > Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
> > 1 file changed, 131 insertions(+)
> > create mode 100644 Documentation/filesystems/mmap_prepare.rst
> >
> > diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
> > new file mode 100644
> > index 000000000000..76908200f3a1
> > --- /dev/null
> > +++ b/Documentation/filesystems/mmap_prepare.rst
> > @@ -0,0 +1,131 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===========================
> > +mmap_prepare callback HOWTO
> > +===========================
> > +
> > +Introduction
> > +############
> > +
> > +The `struct file->f_op->mmap()` callback has been deprecated as it is both a
> > +stability and security risk, and doesn't always permit the merging of adjacent
> > +mappings resulting in unnecessary memory fragmentation.
> > +
> > +It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
> > +these problems.
> > +
> > +## How To Use
> > +
> > +In your driver's `struct file_operations` struct, specify an `mmap_prepare`
> > +callback rather than an `mmap` one, e.g. for ext4:
> > +
> > +
> > +.. code-block:: C
> > +
> > + const struct file_operations ext4_file_operations = {
> > + ...
> > + .mmap_prepare = ext4_file_mmap_prepare,
> > + };
> > +
> > +This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
> > +
> > +Examining the `struct vm_area_desc` type:
> > +
> > +.. code-block:: C
> > +
> > + struct vm_area_desc {
> > + /* Immutable state. */
> > + const struct mm_struct *const mm;
> > + struct file *const file; /* May vary from vm_file in stacked callers. */
> > + unsigned long start;
> > + unsigned long end;
> > +
> > + /* Mutable fields. Populated with initial state. */
> > + pgoff_t pgoff;
> > + struct file *vm_file;
> > + vma_flags_t vma_flags;
> > + pgprot_t page_prot;
> > +
> > + /* Write-only fields. */
> > + const struct vm_operations_struct *vm_ops;
> > + void *private_data;
> > +
> > + /* Take further action? */
> > + struct mmap_action action;
>
> So, action still belongs to /* Write-only fields. */ section? This is
> nitpicky, but it might be better to have this as:
>
> /* Write-only fields. */
> const struct vm_operations_struct *vm_ops;
> void *private_data;
> struct mmap_action action; /* Take further action? */
Absolutely not. This field is not to be written to by the user.
We sadly have to allow hugetlb to do some hacks, but these are things we don't
want to point out.
Users should use mmap_action_xxx() functions.
>
> > + };
> > +
> > +This is straightforward - you have all the fields you need to set up the
> > +mapping, and you can update the mutable and writable fields, for instance:
> > +
> > +.. code-block:: Cw
> > +
> > + static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
> > + {
> > + int ret;
> > + struct file *file = desc->file;
> > + struct inode *inode = file->f_mapping->host;
> > +
> > + ...
> > +
> > + file_accessed(file);
> > + if (IS_DAX(file_inode(file))) {
> > + desc->vm_ops = &ext4_dax_vm_ops;
> > + vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
> > + } else {
> > + desc->vm_ops = &ext4_file_vm_ops;
> > + }
> > + return 0;
> > + }
> > +
> > +Importantly, you no longer have to dance around with reference counts or locks
> > +when updating these fields - __you can simply go ahead and change them__.
> > +
> > +Everything is taken care of by the mapping code.
> > +
> > +VMA Flags
> > +=========
> > +
> > +Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
> > +you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
> > +`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
> > +locking done correctly for you, this is no longer necessary.
> > +
> > +Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
> > +etc. - i.e. using a `VM_xxx` macro has changed too.
> > +
> > +When implementing `mmap_prepare()`, reference flags by their bit number, defined
> > +as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
> > +of (where `desc` is a pointer to `struct vma_area_desc`):
> > +
> > +* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
> > + wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
> > + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
> > + otherwise `false`.
> > +* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
> > + additional flags specified by a comma-separated list,
> > + e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
> > +* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
> > + flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
> > + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
> > +
> > +Actions
> > +=======
> > +
> > +You can now very easily have actions be performed upon a mapping once set up by
> > +utilising simple helper functions invoked upon the `struct vm_area_desc`
> > +pointer. These are:
> > +
> > +* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
> > + range starting a virtual address and PFN number of a set size.
> > +
> > +* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
> > + entire mapping from `start_pfn` onward.
> > +
> > +* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
> > + remap.
> > +
> > +* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
> > + the entire mapping from `start_pfn` onward.
> > +
> > +**NOTE:** The 'action' field should never normally be manipulated directly,
> > +rather you ought to use one of these helpers.
>
> I'm guessing the start and size parameters passed to
> mmap_action_remap() and such are restricted by vm_area_desc.start
> vm_area_desc.end. If so, should we document those restrictions and
> enforce them in the code?
I mean it's the same restrictions as all of the functions already apply if you
were to use them with a VMA descriptor.
I think implicitly a remap will fail if you try it out of the VMA range at the
point of applying the change.
But it might be worth adding range_in_vma_desc() checks at prepare time, will
see if I can do that for the respin.
I think it's pretty obvious that you shouldn't be trying to remap totally
unrelated memory, so I'm not sure that's at a level of granularity that's suited
to this document though.
>
> > + struct vm_area_desc {
> > + /* Immutable state. */
> > + const struct mm_struct *const mm;
> > + struct file *const file; /* May vary from vm_file in stacked callers. */
> > + unsigned long start;
> > + unsigned long end;
>
>
> > --
> > 2.53.0
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
2026-03-16 14:44 ` Lorenzo Stoakes (Oracle)
@ 2026-03-16 21:27 ` Suren Baghdasaryan
0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-16 21:27 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Mon, Mar 16, 2026 at 7:44 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Sun, Mar 15, 2026 at 03:56:54PM -0700, Suren Baghdasaryan wrote:
> > On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > >
> > > Rather than passing arbitrary fields, pass an mmap_action field directly to
> > > mmap prepare and complete helpers to put all the action-specific logic in
> > > the function actually doing the work.
> > >
> > > Additionally, allow mmap prepare functions to return an error so we can
> > > error out as soon as possible if there is something logically incorrect in
> > > the input.
> > >
> > > Update remap_pfn_range_prepare() to properly check the input range for the
> > > CoW case.
> >
> > By "properly check" do you mean the replacement of desc->start and
> > desc->end with action->remap.start and action->remap.start +
> > action->remap.size when calling get_remap_pgoff() from
> > remap_pfn_range_prepare()?
> >
> > >
> > > While we're here, make remap_pfn_range_prepare_vma() a little neater, and
> > > pass mmap_action directly to call_action_complete().
> > >
> > > Then, update compat_vma_mmap() to perform its logic directly, as
> > > __compat_vma_map() is not used by anything so we don't need to export it.
> >
> > Not directly related to this patch but while reviewing, I was also
> > checking vma locking rules in this mmap_prepare() + mmap() sequence
> > and I noticed that the new VMA flag modification functions like
> > vma_set_flags_mask() do assert vma_assert_locked(vma). It would be
>
> Do NOT? :)
Right :)
>
> I don't think it'd work, because in some cases you're setting flags for a
> VMA that is not yet inserted in the tree, etc.
Ah, I see. So, there won't be something similar to vm_flags_init()
that sets vm_flags before the VMA is added to the tree...
I'm a bit paranoid about catching the cases when a VMA is changed
without being locked. Maybe we can add such assert if
vma_is_attached() later. But this is really out of scope of this
patchset, so let's discuss it later. Sorry for the noise.
>
> I don't think it's hugely useful to split out these functions in some way
> in the way the vm_flags_*() stuff is split so we assert sometimes, not
> others.
>
> I'd rather keep this as clean an interface as possible.
Ack.
>
> In any case the majority of cases where flags are being set are not on the
> VMA, so really only core code, that would likely otherwise assert when it
> needs to, would already be asserting.
>
> The cases where drivers will do it, all of them will be using
> vma_desc_set_flags() etc.
That was my biggest worry as drivers might do some VMA modifications
without proper locking but you are right, with mmap_prepare() that
stops being a problem.
>
> > useful to add these but as a separate change. I will add it to my todo
> > list.
>
> So I don't think it'd be generally useful at this time.
>
> >
> > >
> > > Also update compat_vma_mmap() to use vfs_mmap_prepare() rather than calling
> > > the mmap_prepare op directly.
> > >
> > > Finally, update the VMA userland tests to reflect the changes.
> > >
> > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > ---
> > > include/linux/fs.h | 2 -
> > > include/linux/mm.h | 8 +--
> > > mm/internal.h | 28 +++++---
> > > mm/memory.c | 45 +++++++-----
> > > mm/util.c | 112 +++++++++++++-----------------
> > > mm/vma.c | 21 +++---
> > > tools/testing/vma/include/dup.h | 9 ++-
> > > tools/testing/vma/include/stubs.h | 9 +--
> > > 8 files changed, 123 insertions(+), 111 deletions(-)
> > >
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 8b3dd145b25e..a2628a12bd2b 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -2058,8 +2058,6 @@ static inline bool can_mmap_file(struct file *file)
> > > return true;
> > > }
> > >
> > > -int __compat_vma_mmap(const struct file_operations *f_op,
> > > - struct file *file, struct vm_area_struct *vma);
> > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> > >
> > > static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 4c4fd55fc823..cc5960a84382 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -4116,10 +4116,10 @@ static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
> > > mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
> > > }
> > >
> > > -void mmap_action_prepare(struct mmap_action *action,
> > > - struct vm_area_desc *desc);
> > > -int mmap_action_complete(struct mmap_action *action,
> > > - struct vm_area_struct *vma);
> > > +int mmap_action_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action);
> > > +int mmap_action_complete(struct vm_area_struct *vma,
> > > + struct mmap_action *action);
> > >
> > > /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
> > > static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
> > > diff --git a/mm/internal.h b/mm/internal.h
> > > index 95b583e7e4f7..7bfa85b5e78b 100644
> > > --- a/mm/internal.h
> > > +++ b/mm/internal.h
> > > @@ -1775,26 +1775,32 @@ int walk_page_range_debug(struct mm_struct *mm, unsigned long start,
> > > void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm);
> > > int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm);
> > >
> > > -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
> > > -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> > > - unsigned long pfn, unsigned long size, pgprot_t pgprot);
> > > +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action);
> > > +int remap_pfn_range_complete(struct vm_area_struct *vma,
> > > + struct mmap_action *action);
> > >
> > > -static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> > > - unsigned long orig_pfn, unsigned long size)
> > > +static inline int io_remap_pfn_range_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action)
> > > {
> > > + const unsigned long orig_pfn = action->remap.start_pfn;
> > > + const unsigned long size = action->remap.size;
> > > const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > >
> > > - return remap_pfn_range_prepare(desc, pfn);
> > > + action->remap.start_pfn = pfn;
> > > + return remap_pfn_range_prepare(desc, action);
> > > }
> > >
> > > static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
> > > - unsigned long addr, unsigned long orig_pfn, unsigned long size,
> > > - pgprot_t orig_prot)
> > > + struct mmap_action *action)
> > > {
> > > - const unsigned long pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > > - const pgprot_t prot = pgprot_decrypted(orig_prot);
> > > + const unsigned long size = action->remap.size;
> > > + const unsigned long orig_pfn = action->remap.start_pfn;
> > > + const pgprot_t orig_prot = vma->vm_page_prot;
> > >
> > > - return remap_pfn_range_complete(vma, addr, pfn, size, prot);
> > > + action->remap.pgprot = pgprot_decrypted(orig_prot);
> > > + action->remap.start_pfn = io_remap_pfn_range_pfn(orig_pfn, size);
> > > + return remap_pfn_range_complete(vma, action);
> > > }
> > >
> > > #ifdef CONFIG_MMU_NOTIFIER
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 6aa0ea4af1fc..364fa8a45360 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -3099,26 +3099,34 @@ static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> > > }
> > > #endif
> > >
> > > -void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
> > > +int remap_pfn_range_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action)
> > > {
> > > - /*
> > > - * We set addr=VMA start, end=VMA end here, so this won't fail, but we
> > > - * check it again on complete and will fail there if specified addr is
> > > - * invalid.
> > > - */
> > > - get_remap_pgoff(vma_desc_is_cow_mapping(desc), desc->start, desc->end,
> > > - desc->start, desc->end, pfn, &desc->pgoff);
> > > + const unsigned long start = action->remap.start;
> > > + const unsigned long end = start + action->remap.size;
> > > + const unsigned long pfn = action->remap.start_pfn;
> > > + const bool is_cow = vma_desc_is_cow_mapping(desc);
> >
> > I was trying to figure out who sets action->remap.start and
> > action->remap.size and if they somehow guaranteed to be always equal
> > to desc->start and (desc->end - desc->start). My understanding is that
> > action->remap.start and action->remap.size are set by
> > f_op->mmap_prepare() but I'm not sure if they are always the same as
> > desc->start and (desc->end - desc->start) and if so, how do we enforce
> > that.
>
> They are set, and they might not always be the same, because the existing
> implementation does not set them the same.
>
> Once I've completed the change, I can check to ensure that nobody is doing
> anything crazy with this.
>
> I also plan to add specific discontiguous range handlers to handle the
> cases where drivers wish to map that way.
>
> In fact, I already implemented it (and DMA coherent stuff) but stripped it
> out the series for now for time (the original series was ~27 patches :) as
> I want to test that more etc.
>
> Users have access to mmap_action_remap_full() to specify that they want to
> remap the full range.
Got it. IOW [action->remap.start,
action->remap.start+action->remap.size] should be equal or contained
within [desc->start, desc->end] range.
>
> >
> > > + int err;
> > > +
> > > + err = get_remap_pgoff(is_cow, start, end, desc->start, desc->end, pfn,
> > > + &desc->pgoff);
> > > + if (err)
> > > + return err;
> > > +
> > > vma_desc_set_flags_mask(desc, VMA_REMAP_FLAGS);
> > > + return 0;
> > > }
> > >
> > > -static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma, unsigned long addr,
> > > - unsigned long pfn, unsigned long size)
> > > +static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma,
> > > + unsigned long addr, unsigned long pfn,
> > > + unsigned long size)
> > > {
> > > - unsigned long end = addr + PAGE_ALIGN(size);
> > > + const unsigned long end = addr + PAGE_ALIGN(size);
> > > + const bool is_cow = is_cow_mapping(vma->vm_flags);
> > > int err;
> > >
> > > - err = get_remap_pgoff(is_cow_mapping(vma->vm_flags), addr, end,
> > > - vma->vm_start, vma->vm_end, pfn, &vma->vm_pgoff);
> > > + err = get_remap_pgoff(is_cow, addr, end, vma->vm_start, vma->vm_end,
> > > + pfn, &vma->vm_pgoff);
> > > if (err)
> > > return err;
> > >
> > > @@ -3151,10 +3159,15 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> > > }
> > > EXPORT_SYMBOL(remap_pfn_range);
> > >
> > > -int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
> > > - unsigned long pfn, unsigned long size, pgprot_t prot)
> > > +int remap_pfn_range_complete(struct vm_area_struct *vma,
> > > + struct mmap_action *action)
> > > {
> > > - return do_remap_pfn_range(vma, addr, pfn, size, prot);
> > > + const unsigned long start = action->remap.start;
> > > + const unsigned long pfn = action->remap.start_pfn;
> > > + const unsigned long size = action->remap.size;
> > > + const pgprot_t prot = action->remap.pgprot;
> > > +
> > > + return do_remap_pfn_range(vma, start, pfn, size, prot);
> > > }
> > >
> > > /**
> > > diff --git a/mm/util.c b/mm/util.c
> > > index ce7ae80047cf..dba1191725b6 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -1163,43 +1163,6 @@ void flush_dcache_folio(struct folio *folio)
> > > EXPORT_SYMBOL(flush_dcache_folio);
> > > #endif
> > >
> > > -/**
> > > - * __compat_vma_mmap() - See description for compat_vma_mmap()
> > > - * for details. This is the same operation, only with a specific file operations
> > > - * struct which may or may not be the same as vma->vm_file->f_op.
> > > - * @f_op: The file operations whose .mmap_prepare() hook is specified.
> > > - * @file: The file which backs or will back the mapping.
> > > - * @vma: The VMA to apply the .mmap_prepare() hook to.
> > > - * Returns: 0 on success or error.
> > > - */
> > > -int __compat_vma_mmap(const struct file_operations *f_op,
> > > - struct file *file, struct vm_area_struct *vma)
> > > -{
> > > - struct vm_area_desc desc = {
> > > - .mm = vma->vm_mm,
> > > - .file = file,
> > > - .start = vma->vm_start,
> > > - .end = vma->vm_end,
> > > -
> > > - .pgoff = vma->vm_pgoff,
> > > - .vm_file = vma->vm_file,
> > > - .vma_flags = vma->flags,
> > > - .page_prot = vma->vm_page_prot,
> > > -
> > > - .action.type = MMAP_NOTHING, /* Default */
> > > - };
> > > - int err;
> > > -
> > > - err = f_op->mmap_prepare(&desc);
> > > - if (err)
> > > - return err;
> > > -
> > > - mmap_action_prepare(&desc.action, &desc);
> > > - set_vma_from_desc(vma, &desc);
> > > - return mmap_action_complete(&desc.action, vma);
> > > -}
> > > -EXPORT_SYMBOL(__compat_vma_mmap);
> > > -
> > > /**
> > > * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> > > * existing VMA and execute any requested actions.
> > > @@ -1228,7 +1191,31 @@ EXPORT_SYMBOL(__compat_vma_mmap);
> > > */
> > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > {
> > > - return __compat_vma_mmap(file->f_op, file, vma);
> > > + struct vm_area_desc desc = {
> > > + .mm = vma->vm_mm,
> > > + .file = file,
> > > + .start = vma->vm_start,
> > > + .end = vma->vm_end,
> > > +
> > > + .pgoff = vma->vm_pgoff,
> > > + .vm_file = vma->vm_file,
> > > + .vma_flags = vma->flags,
> > > + .page_prot = vma->vm_page_prot,
> > > +
> > > + .action.type = MMAP_NOTHING, /* Default */
> > > + };
> > > + int err;
> > > +
> > > + err = vfs_mmap_prepare(file, &desc);
> > > + if (err)
> > > + return err;
> > > +
> > > + err = mmap_action_prepare(&desc, &desc.action);
> > > + if (err)
> > > + return err;
> > > +
> > > + set_vma_from_desc(vma, &desc);
> > > + return mmap_action_complete(vma, &desc.action);
> > > }
> > > EXPORT_SYMBOL(compat_vma_mmap);
> > >
> > > @@ -1320,8 +1307,8 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
> > > }
> > > }
> > >
> > > -static int mmap_action_finish(struct mmap_action *action,
> > > - const struct vm_area_struct *vma, int err)
> > > +static int mmap_action_finish(struct vm_area_struct *vma,
> > > + struct mmap_action *action, int err)
> > > {
> > > /*
> > > * If an error occurs, unmap the VMA altogether and return an error. We
> > > @@ -1355,35 +1342,36 @@ static int mmap_action_finish(struct mmap_action *action,
> > > * action which need to be performed.
> > > * @desc: The VMA descriptor to prepare for @action.
> > > * @action: The action to perform.
> > > + *
> > > + * Returns: 0 on success, otherwise error.
> > > */
> > > -void mmap_action_prepare(struct mmap_action *action,
> > > - struct vm_area_desc *desc)
> > > +int mmap_action_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action)
> >
> > Any reason you are swapping the arguments?
>
> For consistency with other functions to be added.
>
> > It also looks like we always call mmap_action_prepare() with action ==
> > desc->action, like this: mmap_action_prepare(&desc.action, &desc). Why
> > don't we eliminate the action parameter altogether and use desc.action
> > from inside the function?
>
> I think in previous iterations I thought about overriding one action with
> another and wanted to keep that flexibility, but then have never done that
> in practice.
>
> So probably I can just drop that yes, will try it on respin.
Thanks.
>
> >
> > > +
> >
> > extra new line.
>
> Ack will fix
Thanks.
>
> >
> > > {
> > > switch (action->type) {
> > > case MMAP_NOTHING:
> > > - break;
> > > + return 0;
> > > case MMAP_REMAP_PFN:
> > > - remap_pfn_range_prepare(desc, action->remap.start_pfn);
> > > - break;
> > > + return remap_pfn_range_prepare(desc, action);
> > > case MMAP_IO_REMAP_PFN:
> > > - io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
> > > - action->remap.size);
> > > - break;
> > > + return io_remap_pfn_range_prepare(desc, action);
> > > }
> > > }
> > > EXPORT_SYMBOL(mmap_action_prepare);
> > >
> > > /**
> > > * mmap_action_complete - Execute VMA descriptor action.
> > > - * @action: The action to perform.
> > > * @vma: The VMA to perform the action upon.
> > > + * @action: The action to perform.
> > > *
>
> > > * Similar to mmap_action_prepare().
> > > *
> > > * Return: 0 on success, or error, at which point the VMA will be unmapped.
> > > */
> > > -int mmap_action_complete(struct mmap_action *action,
> > > - struct vm_area_struct *vma)
> > > +int mmap_action_complete(struct vm_area_struct *vma,
> > > + struct mmap_action *action)
> > > +
> > > {
> > > int err = 0;
> > >
> > > @@ -1391,23 +1379,19 @@ int mmap_action_complete(struct mmap_action *action,
> > > case MMAP_NOTHING:
> > > break;
> > > case MMAP_REMAP_PFN:
> > > - err = remap_pfn_range_complete(vma, action->remap.start,
> > > - action->remap.start_pfn, action->remap.size,
> > > - action->remap.pgprot);
> > > + err = remap_pfn_range_complete(vma, action);
> > > break;
> > > case MMAP_IO_REMAP_PFN:
> > > - err = io_remap_pfn_range_complete(vma, action->remap.start,
> > > - action->remap.start_pfn, action->remap.size,
> > > - action->remap.pgprot);
> > > + err = io_remap_pfn_range_complete(vma, action);
> > > break;
> > > }
> > >
> > > - return mmap_action_finish(action, vma, err);
> > > + return mmap_action_finish(vma, action, err);
> > > }
> > > EXPORT_SYMBOL(mmap_action_complete);
> > > #else
> > > -void mmap_action_prepare(struct mmap_action *action,
> > > - struct vm_area_desc *desc)
> > > +int mmap_action_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action)
> > > {
> > > switch (action->type) {
> > > case MMAP_NOTHING:
> > > @@ -1417,11 +1401,13 @@ void mmap_action_prepare(struct mmap_action *action,
> > > WARN_ON_ONCE(1); /* nommu cannot handle these. */
> > > break;
> > > }
> > > +
> > > + return 0;
> > > }
> > > EXPORT_SYMBOL(mmap_action_prepare);
> > >
> > > -int mmap_action_complete(struct mmap_action *action,
> > > - struct vm_area_struct *vma)
> > > +int mmap_action_complete(struct vm_area_struct *vma,
> > > + struct mmap_action *action)
> > > {
> > > int err = 0;
> > >
> > > @@ -1436,7 +1422,7 @@ int mmap_action_complete(struct mmap_action *action,
> > > break;
> > > }
> > >
> > > - return mmap_action_finish(action, vma, err);
> > > + return mmap_action_finish(vma, action, err);
> > > }
> > > EXPORT_SYMBOL(mmap_action_complete);
> > > #endif
> > > diff --git a/mm/vma.c b/mm/vma.c
> > > index be64f781a3aa..054cf1d262fb 100644
> > > --- a/mm/vma.c
> > > +++ b/mm/vma.c
> > > @@ -2613,15 +2613,19 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
> > > vma_set_page_prot(vma);
> > > }
> > >
> > > -static void call_action_prepare(struct mmap_state *map,
> > > - struct vm_area_desc *desc)
> > > +static int call_action_prepare(struct mmap_state *map,
> > > + struct vm_area_desc *desc)
> > > {
> > > struct mmap_action *action = &desc->action;
> > > + int err;
> > >
> > > - mmap_action_prepare(action, desc);
> > > + err = mmap_action_prepare(desc, action);
> > > + if (err)
> > > + return err;
> > >
> > > if (action->hide_from_rmap_until_complete)
> > > map->hold_file_rmap_lock = true;
> > > + return 0;
> > > }
> > >
> > > /*
> > > @@ -2645,7 +2649,9 @@ static int call_mmap_prepare(struct mmap_state *map,
> > > if (err)
> > > return err;
> > >
> > > - call_action_prepare(map, desc);
> > > + err = call_action_prepare(map, desc);
> > > + if (err)
> > > + return err;
> > >
> > > /* Update fields permitted to be changed. */
> > > map->pgoff = desc->pgoff;
> > > @@ -2700,13 +2706,12 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> > > }
> > >
> > > static int call_action_complete(struct mmap_state *map,
> > > - struct vm_area_desc *desc,
> > > + struct mmap_action *action,
> > > struct vm_area_struct *vma)
> > > {
> > > - struct mmap_action *action = &desc->action;
> > > int ret;
> > >
> > > - ret = mmap_action_complete(action, vma);
> > > + ret = mmap_action_complete(vma, action);
> > >
> > > /* If we held the file rmap we need to release it. */
> > > if (map->hold_file_rmap_lock) {
> > > @@ -2768,7 +2773,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > __mmap_complete(&map, vma);
> > >
> > > if (have_mmap_prepare && allocated_new) {
> > > - error = call_action_complete(&map, &desc, vma);
> > > + error = call_action_complete(&map, &desc.action, vma);
> > >
> > > if (error)
> > > return error;
> > > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > > index 5eb313beb43d..908beb263307 100644
> > > --- a/tools/testing/vma/include/dup.h
> > > +++ b/tools/testing/vma/include/dup.h
> > > @@ -1106,7 +1106,7 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
> > >
> > > .pgoff = vma->vm_pgoff,
> > > .vm_file = vma->vm_file,
> > > - .vm_flags = vma->vm_flags,
> > > + .vma_flags = vma->flags,
> > > .page_prot = vma->vm_page_prot,
> > >
> > > .action.type = MMAP_NOTHING, /* Default */
> > > @@ -1117,9 +1117,12 @@ static inline int __compat_vma_mmap(const struct file_operations *f_op,
> > > if (err)
> > > return err;
> > >
> > > - mmap_action_prepare(&desc.action, &desc);
> > > + err = mmap_action_prepare(&desc, &desc.action);
> > > + if (err)
> > > + return err;
> > > +
> > > set_vma_from_desc(vma, &desc);
> > > - return mmap_action_complete(&desc.action, vma);
> > > + return mmap_action_complete(vma, &desc.action);
> > > }
> > >
> > > static inline int compat_vma_mmap(struct file *file,
> > > diff --git a/tools/testing/vma/include/stubs.h b/tools/testing/vma/include/stubs.h
> > > index 947a3a0c2566..76c4b668bc62 100644
> > > --- a/tools/testing/vma/include/stubs.h
> > > +++ b/tools/testing/vma/include/stubs.h
> > > @@ -81,13 +81,14 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
> > > {
> > > }
> > >
> > > -static inline void mmap_action_prepare(struct mmap_action *action,
> > > - struct vm_area_desc *desc)
> > > +static inline int mmap_action_prepare(struct vm_area_desc *desc,
> > > + struct mmap_action *action)
> > > {
> > > + return 0;
> > > }
> > >
> > > -static inline int mmap_action_complete(struct mmap_action *action,
> > > - struct vm_area_struct *vma)
> > > +static inline int mmap_action_complete(struct vm_area_struct *vma,
> > > + struct mmap_action *action)
> > > {
> > > return 0;
> > > }
> > > --
> > > 2.53.0
> > >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback
2026-03-16 19:16 ` Lorenzo Stoakes (Oracle)
@ 2026-03-16 22:59 ` Suren Baghdasaryan
0 siblings, 0 replies; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-16 22:59 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Andrew Morton, Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Mon, Mar 16, 2026 at 12:17 PM Lorenzo Stoakes (Oracle)
<ljs@kernel.org> wrote:
>
> On Sun, Mar 15, 2026 at 04:23:14PM -0700, Suren Baghdasaryan wrote:
> > On Thu, Mar 12, 2026 at 1:27 PM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > >
> > > This documentation makes it easier for a driver/file system implementer to
> > > correctly use this callback.
> > >
> > > It covers the fundamentals, whilst intentionally leaving the less lovely
> > > possible actions one might take undocumented (for instance - the
> > > success_hook, error_hook fields in mmap_action).
> > >
> > > The document also covers the new VMA flags implementation which is the only
> > > one which will work correctly with mmap_prepare.
> > >
> > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > ---
> > > Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
> > > 1 file changed, 131 insertions(+)
> > > create mode 100644 Documentation/filesystems/mmap_prepare.rst
> > >
> > > diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
> > > new file mode 100644
> > > index 000000000000..76908200f3a1
> > > --- /dev/null
> > > +++ b/Documentation/filesystems/mmap_prepare.rst
> > > @@ -0,0 +1,131 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +
> > > +===========================
> > > +mmap_prepare callback HOWTO
> > > +===========================
> > > +
> > > +Introduction
> > > +############
> > > +
> > > +The `struct file->f_op->mmap()` callback has been deprecated as it is both a
> > > +stability and security risk, and doesn't always permit the merging of adjacent
> > > +mappings resulting in unnecessary memory fragmentation.
> > > +
> > > +It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
> > > +these problems.
> > > +
> > > +## How To Use
> > > +
> > > +In your driver's `struct file_operations` struct, specify an `mmap_prepare`
> > > +callback rather than an `mmap` one, e.g. for ext4:
> > > +
> > > +
> > > +.. code-block:: C
> > > +
> > > + const struct file_operations ext4_file_operations = {
> > > + ...
> > > + .mmap_prepare = ext4_file_mmap_prepare,
> > > + };
> > > +
> > > +This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
> > > +
> > > +Examining the `struct vm_area_desc` type:
> > > +
> > > +.. code-block:: C
> > > +
> > > + struct vm_area_desc {
> > > + /* Immutable state. */
> > > + const struct mm_struct *const mm;
> > > + struct file *const file; /* May vary from vm_file in stacked callers. */
> > > + unsigned long start;
> > > + unsigned long end;
> > > +
> > > + /* Mutable fields. Populated with initial state. */
> > > + pgoff_t pgoff;
> > > + struct file *vm_file;
> > > + vma_flags_t vma_flags;
> > > + pgprot_t page_prot;
> > > +
> > > + /* Write-only fields. */
> > > + const struct vm_operations_struct *vm_ops;
> > > + void *private_data;
> > > +
> > > + /* Take further action? */
> > > + struct mmap_action action;
> >
> > So, action still belongs to /* Write-only fields. */ section? This is
> > nitpicky, but it might be better to have this as:
> >
> > /* Write-only fields. */
> > const struct vm_operations_struct *vm_ops;
> > void *private_data;
> > struct mmap_action action; /* Take further action? */
>
> Absolutely not. This field is not to be written to by the user.
>
> We sadly have to allow hugetlb to do some hacks, but these are things we don't
> want to point out.
Ack.
>
> Users should use mmap_action_xxx() functions.
>
> >
> > > + };
> > > +
> > > +This is straightforward - you have all the fields you need to set up the
> > > +mapping, and you can update the mutable and writable fields, for instance:
> > > +
> > > +.. code-block:: Cw
> > > +
> > > + static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
> > > + {
> > > + int ret;
> > > + struct file *file = desc->file;
> > > + struct inode *inode = file->f_mapping->host;
> > > +
> > > + ...
> > > +
> > > + file_accessed(file);
> > > + if (IS_DAX(file_inode(file))) {
> > > + desc->vm_ops = &ext4_dax_vm_ops;
> > > + vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
> > > + } else {
> > > + desc->vm_ops = &ext4_file_vm_ops;
> > > + }
> > > + return 0;
> > > + }
> > > +
> > > +Importantly, you no longer have to dance around with reference counts or locks
> > > +when updating these fields - __you can simply go ahead and change them__.
> > > +
> > > +Everything is taken care of by the mapping code.
> > > +
> > > +VMA Flags
> > > +=========
> > > +
> > > +Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
> > > +you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
> > > +`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
> > > +locking done correctly for you, this is no longer necessary.
> > > +
> > > +Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
> > > +etc. - i.e. using a `VM_xxx` macro has changed too.
> > > +
> > > +When implementing `mmap_prepare()`, reference flags by their bit number, defined
> > > +as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
> > > +of (where `desc` is a pointer to `struct vma_area_desc`):
> > > +
> > > +* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
> > > + wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
> > > + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
> > > + otherwise `false`.
> > > +* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
> > > + additional flags specified by a comma-separated list,
> > > + e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
> > > +* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
> > > + flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
> > > + VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
> > > +
> > > +Actions
> > > +=======
> > > +
> > > +You can now very easily have actions be performed upon a mapping once set up by
> > > +utilising simple helper functions invoked upon the `struct vm_area_desc`
> > > +pointer. These are:
> > > +
> > > +* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
> > > + range starting a virtual address and PFN number of a set size.
> > > +
> > > +* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
> > > + entire mapping from `start_pfn` onward.
> > > +
> > > +* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
> > > + remap.
> > > +
> > > +* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
> > > + the entire mapping from `start_pfn` onward.
> > > +
> > > +**NOTE:** The 'action' field should never normally be manipulated directly,
> > > +rather you ought to use one of these helpers.
> >
> > I'm guessing the start and size parameters passed to
> > mmap_action_remap() and such are restricted by vm_area_desc.start
> > vm_area_desc.end. If so, should we document those restrictions and
> > enforce them in the code?
>
> I mean it's the same restrictions as all of the functions already apply if you
> were to use them with a VMA descriptor.
>
> I think implicitly a remap will fail if you try it out of the VMA range at the
> point of applying the change.
>
> But it might be worth adding range_in_vma_desc() checks at prepare time, will
> see if I can do that for the respin.
>
> I think it's pretty obvious that you shouldn't be trying to remap totally
> unrelated memory, so I'm not sure that's at a level of granularity that's suited
> to this document though.
I just saw you already have WARN_ON_ONCE() inside mmap_action_remap()
to check for these limits, so codewise I think we are already good.
For documentation I'll rely on your judgement whether to mention this or not.
>
> >
> > > + struct vm_area_desc {
> > > + /* Immutable state. */
> > > + const struct mm_struct *const mm;
> > > + struct file *const file; /* May vary from vm_file in stacked callers. */
> > > + unsigned long start;
> > > + unsigned long end;
> >
> >
> > > --
> > > 2.53.0
> > >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 04/15] mm: add vm_ops->mapped hook
2026-03-16 13:39 ` Lorenzo Stoakes (Oracle)
@ 2026-03-16 23:39 ` Suren Baghdasaryan
2026-03-17 8:42 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-16 23:39 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Mon, Mar 16, 2026 at 6:39 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Sun, Mar 15, 2026 at 07:18:38PM -0700, Suren Baghdasaryan wrote:
> > On Fri, Mar 13, 2026 at 4:58 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > >
> > > On Fri, Mar 13, 2026 at 04:02:36AM -0700, Usama Arif wrote:
> > > > On Thu, 12 Mar 2026 20:27:19 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > > >
> > > > > Previously, when a driver needed to do something like establish a reference
> > > > > count, it could do so in the mmap hook in the knowledge that the mapping
> > > > > would succeed.
> > > > >
> > > > > With the introduction of f_op->mmap_prepare this is no longer the case, as
> > > > > it is invoked prior to actually establishing the mapping.
> > > > >
> > > > > To take this into account, introduce a new vm_ops->mapped callback which is
> > > > > invoked when the VMA is first mapped (though notably - not when it is
> > > > > merged - which is correct and mirrors existing mmap/open/close behaviour).
> > > > >
> > > > > We do better that vm_ops->open() here, as this callback can return an
> > > > > error, at which point the VMA will be unmapped.
> > > > >
> > > > > Note that vm_ops->mapped() is invoked after any mmap action is
> > > > > complete (such as I/O remapping).
> > > > >
> > > > > We intentionally do not expose the VMA at this point, exposing only the
> > > > > fields that could be used, and an output parameter in case the operation
> > > > > needs to update the vma->vm_private_data field.
> > > > >
> > > > > In order to deal with stacked filesystems which invoke inner filesystem's
> > > > > mmap() invocations, add __compat_vma_mapped() and invoke it on
> > > > > vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
> > > > > handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
> > > > > callback.
> > > > >
> > > > > We can now also remove call_action_complete() and invoke
> > > > > mmap_action_complete() directly, as we separate out the rmap lock logic to
> > > > > be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
> > > > >
> > > > > We also abstract unmapping of a VMA on mmap action completion into its own
> > > > > helper function, unmap_vma_locked().
> > > > >
> > > > > Additionally, update VMA userland test headers to reflect the change.
> > > > >
> > > > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > > > ---
> > > > > include/linux/fs.h | 9 +++-
> > > > > include/linux/mm.h | 17 +++++++
> > > > > mm/internal.h | 10 ++++
> > > > > mm/util.c | 86 ++++++++++++++++++++++++---------
> > > > > mm/vma.c | 41 +++++++++++-----
> > > > > tools/testing/vma/include/dup.h | 34 ++++++++++++-
> > > > > 6 files changed, 158 insertions(+), 39 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > > > index a2628a12bd2b..c390f5c667e3 100644
> > > > > --- a/include/linux/fs.h
> > > > > +++ b/include/linux/fs.h
> > > > > @@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
> > > > > }
> > > > >
> > > > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> > > > > +int __vma_check_mmap_hook(struct vm_area_struct *vma);
> > > > >
> > > > > static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> > > > > {
> > > > > + int err;
> > > > > +
> > > > > if (file->f_op->mmap_prepare)
> > > > > return compat_vma_mmap(file, vma);
> > > > >
> > > > > - return file->f_op->mmap(file, vma);
> > > > > + err = file->f_op->mmap(file, vma);
> > > > > + if (err)
> > > > > + return err;
> > > > > +
> > > > > + return __vma_check_mmap_hook(vma);
> > > > > }
> > > > >
> > > > > static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
> > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > index 12a0b4c63736..7333d5db1221 100644
> > > > > --- a/include/linux/mm.h
> > > > > +++ b/include/linux/mm.h
> > > > > @@ -759,6 +759,23 @@ struct vm_operations_struct {
> > > > > * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > */
> > > > > void (*close)(struct vm_area_struct *vma);
> > > > > + /**
> > > > > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > > > > + * the new VMA is merged with an adjacent VMA.
> > > > > + *
> > > > > + * The @vm_private_data field is an output field allowing the user to
> > > > > + * modify vma->vm_private_data as necessary.
> > > > > + *
> > > > > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > > > > + * set from f_op->mmap.
> > > > > + *
> > > > > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > > > > + * be unmapped.
> > > > > + *
> > > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > + */
> > > > > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > > + const struct file *file, void **vm_private_data);
> > > > > /* Called any time before splitting to check if it's allowed */
> > > > > int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> > > > > int (*mremap)(struct vm_area_struct *vma);
> > > > > diff --git a/mm/internal.h b/mm/internal.h
> > > > > index 7bfa85b5e78b..f0f2cf1caa36 100644
> > > > > --- a/mm/internal.h
> > > > > +++ b/mm/internal.h
> > > > > @@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
> > > > > * mmap hook and safely handle error conditions. On error, VMA hooks will be
> > > > > * mutated.
> > > > > *
> > > > > + * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
> > > > > + *
> >
> > What exactly would one do to "prefer f_op->mmap_prepare()"?
>
> I'm saying a person should implement f_op->mmap_prepare() rather than
> f_op->mmap(), since the latter is deprecated :)
>
> I think that's pretty clear no?
>
> > Since you are adding this comment for mmap_file(), I think you need to
> > describe more specifically what one should call instead.
>
> I think it'd be a complete distraction, since if you're at the point of calling
> mmap_file() you're already not implement mmap_prepare except as a compatbility
> layer.
Yep, it seems like a warning that comes too late.
>
> I mean maybe I'll just drop this as it seems to be causing confusion.
Maybe instead we add a comment that f_ops->mmap is deprecated in favor
of f_ops->mmap_prepare() in here:
https://elixir.bootlin.com/linux/v7.0-rc4/source/include/linux/fs.h#L1940
?
>
> >
> > > > > * @file: File which backs the mapping.
> > > > > * @vma: VMA which we are mapping.
> > > > > *
> > > > > @@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
> > > > > /* unmap_vmas is in mm/memory.c */
> > > > > void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
> > > > >
> > > > > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > > > > +{
> > > > > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > > +
> > > > > + mmap_assert_locked(vma->vm_mm);
> >
> > You must hold the mmap write lock when unmapping. Would be better to
> > assert mmap_assert_write_locked() or even vma_assert_write_locked(),
> > which implies mmap_assert_write_locked().
>
> I'm not sure why we don't assert this in those paths.
>
> I think I assumed we could only assert readonly because one of those paths
> downgrades the mmap write lock to a read lock.
>
> I don't think we can do a VMA write lock assert here, since at the point of
> do_munmap() all callers can't possibly have the VMA write lock, since they are
> _looking up_ the VMA at the specified address.
It sounds strange to me that we are unmapping a VMA that was not
locked beforehand. Let me look into the call chains a bit more to
convince myself one way or the other. The fact that do_munmap() looks
up the VMA by address and then write-locks it inside
vms_gather_munmap_vmas() does not mean the VMA was not already locked.
vma_start_write() is re-entrant.
>
> But I can convert this to an mmap_assert_write_locked()!
Ok, let's go with that. I don't want to slow down your patchset while
I investigate locking rules here. We can strengthen the assertion
later.
>
> >
> > > > > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > > > > +}
> > > > > +
> > > > > #ifdef CONFIG_MMU
> > > > >
> > > > > static inline void get_anon_vma(struct anon_vma *anon_vma)
> > > > > diff --git a/mm/util.c b/mm/util.c
> > > > > index dba1191725b6..2b0ed54008d6 100644
> > > > > --- a/mm/util.c
> > > > > +++ b/mm/util.c
> > > > > @@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
> > > > > EXPORT_SYMBOL(flush_dcache_folio);
> > > > > #endif
> > > > >
> > > > > +static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > > > +{
> > > > > + struct vm_area_desc desc = {
> > > > > + .mm = vma->vm_mm,
> > > > > + .file = file,
> > > > > + .start = vma->vm_start,
> > > > > + .end = vma->vm_end,
> > > > > +
> > > > > + .pgoff = vma->vm_pgoff,
> > > > > + .vm_file = vma->vm_file,
> > > > > + .vma_flags = vma->flags,
> > > > > + .page_prot = vma->vm_page_prot,
> > > > > +
> > > > > + .action.type = MMAP_NOTHING, /* Default */
> > > > > + };
> > > > > + int err;
> > > > > +
> > > > > + err = vfs_mmap_prepare(file, &desc);
> > > > > + if (err)
> > > > > + return err;
> > > > > +
> > > > > + err = mmap_action_prepare(&desc, &desc.action);
> > > > > + if (err)
> > > > > + return err;
> > > > > +
> > > > > + set_vma_from_desc(vma, &desc);
> > > > > + return mmap_action_complete(vma, &desc.action);
> > > > > +}
> > > > > +
> > > > > +static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
> > > > > +{
> > > > > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > > > > + void *vm_private_data = vma->vm_private_data;
> > > > > + int err;
> > > > > +
> > > > > + if (!vm_ops->mapped)
> > > > > + return 0;
> > > > > +
> > > >
> > > > Hello!
> > > >
> > > > Can vm_ops be NULL here? __compat_vma_mapped() is called from
> > > > compat_vma_mmap(), which is reached when a filesystem provides
> > > > mmap_prepare. If the mmap_prepare hook does not set desc->vm_ops,
> > > > vma->vm_ops will be NULL and this dereferences a NULL pointer.
> > >
> > > I _think_ for this to ever be invoked, you would need to be dealing with a
> > > file-backed VMA so vm_ops->fault would HAVE to be defined.
> > >
> > > But you're right anyway as a matter of principle we should check it! Will fix.
> > >
> > > >
> > > > For e.g. drivers/char/mem.c, mmap_zero_prepare() would trigger
> > > > a NULL pointer dereference here.
> > > >
> > > > Would need to do
> > > > if (!vm_ops || !vm_ops->mapped)
> > > > return 0;
> > > >
> > > > here
> > >
> > > Yes.
> > >
> > > >
> > > >
> > > > > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
> > > > > + &vm_private_data);
> > > > > + if (err)
> > > > > + unmap_vma_locked(vma);
> > > >
> > > > when mapped() returns an error, unmap_vma_locked(vma) is called
> > > > but execution continues into the vm_private_data update below. After
> > > > unmap_vma_locked() the VMA may be freed (do_munmap can remove the VMA
> > > > entirely), so accessing vma->vm_private_data after that is a
> > > > use-after-free.
> > >
> > > Very good point :) will fix thanks!
> > >
> > > Probably:
> > >
> > > if (err)
> > > unmap_vma_locked(vma);
> > > else if (vm_private_data != vma->vm_private_data)
> > > vma->vm_private_data = vm_private_data;
> > >
> > > return err;
> > >
> > > Would be fine.
> > >
> > > >
> > > > Probably need to do:
> > > > if (err) {
> > > > unmap_vma_locked(vma);
> > > > return err;
> > > > }
> > > >
> > > > > + /* Update private data if changed. */
> > > > > + if (vm_private_data != vma->vm_private_data)
> > > > > + vma->vm_private_data = vm_private_data;
> > > > > +
> > > > > + return err;
> > > > > +}
> > > > > +
> > > > > /**
> > > > > * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> > > > > * existing VMA and execute any requested actions.
> > > > > @@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
> > > > > */
> > > > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > > > {
> > > > > - struct vm_area_desc desc = {
> > > > > - .mm = vma->vm_mm,
> > > > > - .file = file,
> > > > > - .start = vma->vm_start,
> > > > > - .end = vma->vm_end,
> > > > > -
> > > > > - .pgoff = vma->vm_pgoff,
> > > > > - .vm_file = vma->vm_file,
> > > > > - .vma_flags = vma->flags,
> > > > > - .page_prot = vma->vm_page_prot,
> > > > > -
> > > > > - .action.type = MMAP_NOTHING, /* Default */
> > > > > - };
> > > > > int err;
> > > > >
> > > > > - err = vfs_mmap_prepare(file, &desc);
> > > > > - if (err)
> > > > > - return err;
> > > > > -
> > > > > - err = mmap_action_prepare(&desc, &desc.action);
> > > > > + err = __compat_vma_mmap(file, vma);
> > > > > if (err)
> > > > > return err;
> > > > >
> > > > > - set_vma_from_desc(vma, &desc);
> > > > > - return mmap_action_complete(vma, &desc.action);
> > > > > + return __compat_vma_mapped(file, vma);
> > > > > }
> > > > > EXPORT_SYMBOL(compat_vma_mmap);
> > > > >
> > > > > +int __vma_check_mmap_hook(struct vm_area_struct *vma)
> > > > > +{
> > > > > + /* vm_ops->mapped is not valid if mmap() is specified. */
> > > > > + if (WARN_ON_ONCE(vma->vm_ops->mapped))
> > > > > + return -EINVAL;
> > > >
> > > > I think vma->vm_ops can be NULL here. Should be:
> > > >
> > > > if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped))
> > > > return -EINVAL;
> > >
> > > I think again you'd probably only invoke this on file-backed so be ok, but again
> > > as a matter of principle we should check it so will fix, thanks!
> > >
> > > >
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL(__vma_check_mmap_hook);
> >
> > nit: Any reason __vma_check_mmap_hook() is not inlined next to its
> > user vfs_mmap()?
>
> Headers fun, fs.h is a 'before mm.h' header, so vm_operations_struct is not
> declared yet here, so we can't actually do the check there.
Ack.
>
> >
> > > > > +
> > > > > static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
> > > > > const struct page *page)
> > > > > {
> > > > > @@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
> > > > > * invoked if we do NOT merge, so we only clean up the VMA we created.
> > > > > */
> > > > > if (err) {
> > > > > - const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > > -
> > > > > - do_munmap(current->mm, vma->vm_start, len, NULL);
> > > > > -
> > > > > + unmap_vma_locked(vma);
> > > > > if (action->error_hook) {
> > > > > /* We may want to filter the error. */
> > > > > err = action->error_hook(err);
> > > > > diff --git a/mm/vma.c b/mm/vma.c
> > > > > index 054cf1d262fb..ef9f5a5365d1 100644
> > > > > --- a/mm/vma.c
> > > > > +++ b/mm/vma.c
> > > > > @@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> > > > > return false;
> > > > > }
> > > > >
> > > > > -static int call_action_complete(struct mmap_state *map,
> > > > > - struct mmap_action *action,
> > > > > - struct vm_area_struct *vma)
> > > > > +static int call_mapped_hook(struct vm_area_struct *vma)
> > > > > {
> > > > > - int ret;
> > > > > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > > > > + void *vm_private_data = vma->vm_private_data;
> > > > > + int err;
> > > > >
> > > > > - ret = mmap_action_complete(vma, action);
> > > > > + if (!vm_ops || !vm_ops->mapped)
> > > > > + return 0;
> > > > > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
> > > > > + vma->vm_file, &vm_private_data);
> > > > > + if (err) {
> > > > > + unmap_vma_locked(vma);
> > > > > + return err;
> > > > > + }
> > > > > + /* Update private data if changed. */
> > > > > + if (vm_private_data != vma->vm_private_data)
> > > > > + vma->vm_private_data = vm_private_data;
> > > > > + return 0;
> > > > > +}
> > > > >
> > > > > - /* If we held the file rmap we need to release it. */
> > > > > - if (map->hold_file_rmap_lock) {
> > > > > - struct file *file = vma->vm_file;
> > > > > +static void maybe_drop_file_rmap_lock(struct mmap_state *map,
> > > > > + struct vm_area_struct *vma)
> > > > > +{
> > > > > + struct file *file;
> > > > >
> > > > > - i_mmap_unlock_write(file->f_mapping);
> > > > > - }
> > > > > - return ret;
> > > > > + if (!map->hold_file_rmap_lock)
> > > > > + return;
> > > > > + file = vma->vm_file;
> > > > > + i_mmap_unlock_write(file->f_mapping);
> > > > > }
> > > > >
> > > > > static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > > > @@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > > > __mmap_complete(&map, vma);
> > > > >
> > > > > if (have_mmap_prepare && allocated_new) {
> > > > > - error = call_action_complete(&map, &desc.action, vma);
> > > > > + error = mmap_action_complete(vma, &desc.action);
> > > > > + if (!error)
> > > > > + error = call_mapped_hook(vma);
> > > > >
> > > > > + maybe_drop_file_rmap_lock(&map, vma);
> > > > > if (error)
> > > > > return error;
> > > > > }
> > > > > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > > > > index 908beb263307..47d8db809f31 100644
> > > > > --- a/tools/testing/vma/include/dup.h
> > > > > +++ b/tools/testing/vma/include/dup.h
> > > > > @@ -606,12 +606,34 @@ struct vm_area_struct {
> > > > > } __randomize_layout;
> > > > >
> > > > > struct vm_operations_struct {
> > > > > - void (*open)(struct vm_area_struct * area);
> > > > > + /**
> > > > > + * @open: Called when a VMA is remapped or split. Not called upon first
> > > > > + * mapping a VMA.
> > > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > + */
> >
> > This comment should have been introduced in the previous patch.
>
> It's the testing code, it's not really important. But if I respin I'll fix... :)
Thanks!
>
> >
> > > > > + void (*open)(struct vm_area_struct *vma);
> > > > > /**
> > > > > * @close: Called when the VMA is being removed from the MM.
> > > > > * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > */
> > > > > - void (*close)(struct vm_area_struct * area);
> > > > > + void (*close)(struct vm_area_struct *vma);
> > > > > + /**
> > > > > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > > > > + * the new VMA is merged with an adjacent VMA.
> > > > > + *
> > > > > + * The @vm_private_data field is an output field allowing the user to
> > > > > + * modify vma->vm_private_data as necessary.
> > > > > + *
> > > > > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > > > > + * set from f_op->mmap.
> > > > > + *
> > > > > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > > > > + * be unmapped.
> > > > > + *
> > > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > + */
> > > > > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > > + const struct file *file, void **vm_private_data);
> > > > > /* Called any time before splitting to check if it's allowed */
> > > > > int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> > > > > int (*mremap)(struct vm_area_struct *area);
> > > > > @@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
> > > > > swap(vma->vm_file, file);
> > > > > fput(file);
> > > > > }
> > > > > +
> > > > > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > > > > +{
> > > > > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > > +
> > > > > + mmap_assert_locked(vma->vm_mm);
> > > > > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > > > > +}
> > > > > --
> > > > > 2.53.0
> > > > >
> > > > >
> > >
> > > Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
2026-03-16 14:29 ` Lorenzo Stoakes (Oracle)
@ 2026-03-17 3:41 ` Suren Baghdasaryan
2026-03-17 8:58 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 45+ messages in thread
From: Suren Baghdasaryan @ 2026-03-17 3:41 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Mon, Mar 16, 2026 at 7:29 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Sun, Mar 15, 2026 at 07:32:54PM -0700, Suren Baghdasaryan wrote:
> > On Fri, Mar 13, 2026 at 5:00 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > >
> > > On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> > > > On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > > >
> > > > > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > > > > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > > > > the deprecated mmap callback.
> > > > >
> > > > > However, it did not account for the fact that mmap_prepare can fail to map
> > > > > due to an out of memory error, and thus should not be incrementing a
> > > > > reference count on mmap_prepare.
> >
> > This is a bit confusing. I see the current implementation does
> > afs_add_open_mmap() and then if generic_file_mmap_prepare() fails it
> > does afs_drop_open_mmap(), therefore refcounting seems to be balanced.
> > Is there really a problem?
>
> Firstly, mmap_prepare is invoked before we try to merge, so the VMA could in
> theory get merged and then the refcounting will be wrong.
I see now. Ok, makes sense.
>
> Secondly, mmap_prepare occurs at such at time where it is _possible_ that
> allocation failures as described below could happen.
Right, but in that case afs_file_mmap_prepare() would drop its
refcount and return an error, so refcounting is still good, no?
>
> I'll update the commit message to reflect the merge aspect actually.
Thanks!
>
> >
> > > > >
> > > > > With the newly added vm_ops->mapped callback available, we can simply defer
> > > > > this operation to that callback which is only invoked once the mapping is
> > > > > successfully in place (but not yet visible to userspace as the mmap and VMA
> > > > > write locks are held).
> > > > >
> > > > > Therefore add afs_mapped() to implement this callback for AFS.
> > > > >
> > > > > In practice the mapping allocations are 'too small to fail' so this is
> > > > > something that realistically should never happen in practice (or would do
> > > > > so in a case where the process is about to die anyway), but we should still
> > > > > handle this.
> >
> > nit: I would drop the above paragraph. If it's impossible why are you
> > handling it? If it's unlikely, then handling it is even more
> > important.
>
> Sure I can drop it, but it's an ongoing thing with these small allocations.
>
> I wish we could just move to a scenario where we can simpy assume allocations
> will always succeed :)
That would be really nice but unfortunately the world is not that
perfect. I just don't want to be chasing some rarely reproducible bug
because of the assumption that an allocation is too small to fail.
>
> Vlasta - thoughts?
>
> Cheers, Lorenzo
>
> >
> > > > >
> > > > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > > > ---
> > > > > fs/afs/file.c | 20 ++++++++++++++++----
> > > > > 1 file changed, 16 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > > > > index f609366fd2ac..69ef86f5e274 100644
> > > > > --- a/fs/afs/file.c
> > > > > +++ b/fs/afs/file.c
> > > > > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> > > > > static void afs_vm_open(struct vm_area_struct *area);
> > > > > static void afs_vm_close(struct vm_area_struct *area);
> > > > > static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > > > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > > + const struct file *file, void **vm_private_data);
> > > > >
> > > > > const struct file_operations afs_file_operations = {
> > > > > .open = afs_open,
> > > > > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> > > > > };
> > > > >
> > > > > static const struct vm_operations_struct afs_vm_ops = {
> > > > > + .mapped = afs_mapped,
> > > > > .open = afs_vm_open,
> > > > > .close = afs_vm_close,
> > > > > .fault = filemap_fault,
> > > > > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> > > > > afs_add_open_mmap(vnode);
> > > >
> > > > Is the above afs_add_open_mmap an additional one, which could cause a reference
> > > > leak? Does the above one need to be removed and only the one in afs_mapped()
> > > > needs to be kept?
> > >
> > > Ah yeah good spot, will fix thanks!
> > >
> > > >
> > > > >
> > > > > ret = generic_file_mmap_prepare(desc);
> > > > > - if (ret == 0)
> > > > > - desc->vm_ops = &afs_vm_ops;
> > > > > - else
> > > > > - afs_drop_open_mmap(vnode);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > + desc->vm_ops = &afs_vm_ops;
> > > > > return ret;
> > > > > }
> > > > >
> > > > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > > + const struct file *file, void **vm_private_data)
> > > > > +{
> > > > > + struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > > > > +
> > > > > + afs_add_open_mmap(vnode);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > static void afs_vm_open(struct vm_area_struct *vma)
> > > > > {
> > > > > afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > > > > --
> > > > > 2.53.0
> > > > >
> > > > >
> > >
> > > Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 04/15] mm: add vm_ops->mapped hook
2026-03-16 23:39 ` Suren Baghdasaryan
@ 2026-03-17 8:42 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 8:42 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Mon, Mar 16, 2026 at 04:39:00PM -0700, Suren Baghdasaryan wrote:
> On Mon, Mar 16, 2026 at 6:39 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > On Sun, Mar 15, 2026 at 07:18:38PM -0700, Suren Baghdasaryan wrote:
> > > On Fri, Mar 13, 2026 at 4:58 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > > >
> > > > On Fri, Mar 13, 2026 at 04:02:36AM -0700, Usama Arif wrote:
> > > > > On Thu, 12 Mar 2026 20:27:19 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > > > >
> > > > > > Previously, when a driver needed to do something like establish a reference
> > > > > > count, it could do so in the mmap hook in the knowledge that the mapping
> > > > > > would succeed.
> > > > > >
> > > > > > With the introduction of f_op->mmap_prepare this is no longer the case, as
> > > > > > it is invoked prior to actually establishing the mapping.
> > > > > >
> > > > > > To take this into account, introduce a new vm_ops->mapped callback which is
> > > > > > invoked when the VMA is first mapped (though notably - not when it is
> > > > > > merged - which is correct and mirrors existing mmap/open/close behaviour).
> > > > > >
> > > > > > We do better that vm_ops->open() here, as this callback can return an
> > > > > > error, at which point the VMA will be unmapped.
> > > > > >
> > > > > > Note that vm_ops->mapped() is invoked after any mmap action is
> > > > > > complete (such as I/O remapping).
> > > > > >
> > > > > > We intentionally do not expose the VMA at this point, exposing only the
> > > > > > fields that could be used, and an output parameter in case the operation
> > > > > > needs to update the vma->vm_private_data field.
> > > > > >
> > > > > > In order to deal with stacked filesystems which invoke inner filesystem's
> > > > > > mmap() invocations, add __compat_vma_mapped() and invoke it on
> > > > > > vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
> > > > > > handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
> > > > > > callback.
> > > > > >
> > > > > > We can now also remove call_action_complete() and invoke
> > > > > > mmap_action_complete() directly, as we separate out the rmap lock logic to
> > > > > > be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
> > > > > >
> > > > > > We also abstract unmapping of a VMA on mmap action completion into its own
> > > > > > helper function, unmap_vma_locked().
> > > > > >
> > > > > > Additionally, update VMA userland test headers to reflect the change.
> > > > > >
> > > > > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > > > > ---
> > > > > > include/linux/fs.h | 9 +++-
> > > > > > include/linux/mm.h | 17 +++++++
> > > > > > mm/internal.h | 10 ++++
> > > > > > mm/util.c | 86 ++++++++++++++++++++++++---------
> > > > > > mm/vma.c | 41 +++++++++++-----
> > > > > > tools/testing/vma/include/dup.h | 34 ++++++++++++-
> > > > > > 6 files changed, 158 insertions(+), 39 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > > > > index a2628a12bd2b..c390f5c667e3 100644
> > > > > > --- a/include/linux/fs.h
> > > > > > +++ b/include/linux/fs.h
> > > > > > @@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
> > > > > > }
> > > > > >
> > > > > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> > > > > > +int __vma_check_mmap_hook(struct vm_area_struct *vma);
> > > > > >
> > > > > > static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> > > > > > {
> > > > > > + int err;
> > > > > > +
> > > > > > if (file->f_op->mmap_prepare)
> > > > > > return compat_vma_mmap(file, vma);
> > > > > >
> > > > > > - return file->f_op->mmap(file, vma);
> > > > > > + err = file->f_op->mmap(file, vma);
> > > > > > + if (err)
> > > > > > + return err;
> > > > > > +
> > > > > > + return __vma_check_mmap_hook(vma);
> > > > > > }
> > > > > >
> > > > > > static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
> > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > > index 12a0b4c63736..7333d5db1221 100644
> > > > > > --- a/include/linux/mm.h
> > > > > > +++ b/include/linux/mm.h
> > > > > > @@ -759,6 +759,23 @@ struct vm_operations_struct {
> > > > > > * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > > */
> > > > > > void (*close)(struct vm_area_struct *vma);
> > > > > > + /**
> > > > > > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > > > > > + * the new VMA is merged with an adjacent VMA.
> > > > > > + *
> > > > > > + * The @vm_private_data field is an output field allowing the user to
> > > > > > + * modify vma->vm_private_data as necessary.
> > > > > > + *
> > > > > > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > > > > > + * set from f_op->mmap.
> > > > > > + *
> > > > > > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > > > > > + * be unmapped.
> > > > > > + *
> > > > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > > + */
> > > > > > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > > > + const struct file *file, void **vm_private_data);
> > > > > > /* Called any time before splitting to check if it's allowed */
> > > > > > int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> > > > > > int (*mremap)(struct vm_area_struct *vma);
> > > > > > diff --git a/mm/internal.h b/mm/internal.h
> > > > > > index 7bfa85b5e78b..f0f2cf1caa36 100644
> > > > > > --- a/mm/internal.h
> > > > > > +++ b/mm/internal.h
> > > > > > @@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
> > > > > > * mmap hook and safely handle error conditions. On error, VMA hooks will be
> > > > > > * mutated.
> > > > > > *
> > > > > > + * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
> > > > > > + *
> > >
> > > What exactly would one do to "prefer f_op->mmap_prepare()"?
> >
> > I'm saying a person should implement f_op->mmap_prepare() rather than
> > f_op->mmap(), since the latter is deprecated :)
> >
> > I think that's pretty clear no?
> >
> > > Since you are adding this comment for mmap_file(), I think you need to
> > > describe more specifically what one should call instead.
> >
> > I think it'd be a complete distraction, since if you're at the point of calling
> > mmap_file() you're already not implement mmap_prepare except as a compatbility
> > layer.
>
> Yep, it seems like a warning that comes too late.
Yeah, it's the wrong place for it, agreed.
>
> >
> > I mean maybe I'll just drop this as it seems to be causing confusion.
>
> Maybe instead we add a comment that f_ops->mmap is deprecated in favor
> of f_ops->mmap_prepare() in here:
> https://elixir.bootlin.com/linux/v7.0-rc4/source/include/linux/fs.h#L1940
> ?
Yeah could do, I think maybe once the mmap_prepare changes are further along
actually, as I am still essentially figuring out what functionality to
provide/the shape of it as I develop it.
It's a bit chicken-and-egg, but doing it this way has evolved to a pretty nice
approach so far matching what drivers _actually do_ + finding new ways of doing
them without risk of them breaking stuff which is kinda the whole point - this
isn't a rework for rework's sake, but rather effectively completely changing how
drivers perform mmap.
>
> >
> > >
> > > > > > * @file: File which backs the mapping.
> > > > > > * @vma: VMA which we are mapping.
> > > > > > *
> > > > > > @@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
> > > > > > /* unmap_vmas is in mm/memory.c */
> > > > > > void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
> > > > > >
> > > > > > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > > > > > +{
> > > > > > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > > > +
> > > > > > + mmap_assert_locked(vma->vm_mm);
> > >
> > > You must hold the mmap write lock when unmapping. Would be better to
> > > assert mmap_assert_write_locked() or even vma_assert_write_locked(),
> > > which implies mmap_assert_write_locked().
> >
> > I'm not sure why we don't assert this in those paths.
> >
> > I think I assumed we could only assert readonly because one of those paths
> > downgrades the mmap write lock to a read lock.
> >
> > I don't think we can do a VMA write lock assert here, since at the point of
> > do_munmap() all callers can't possibly have the VMA write lock, since they are
> > _looking up_ the VMA at the specified address.
>
> It sounds strange to me that we are unmapping a VMA that was not
> locked beforehand. Let me look into the call chains a bit more to
> convince myself one way or the other. The fact that do_munmap() looks
> up the VMA by address and then write-locks it inside
> vms_gather_munmap_vmas() does not mean the VMA was not already locked.
> vma_start_write() is re-entrant.
Well I mean:
SYSCALL_DEFINE2(munmap, ...)
-> __vm_munmap [ takes mmap write lock ]
-> do_vmi_munmap()
do_munmap() [ assumes (but does not assert, we should add) mmap write lock]
-> do_vmi_munmap()
You can unmap more than one VMA from this interface, or even choose a range that
doesn't have anything mapped.
do_vmi_munmap() gets the first VMA and if none present exits early, then calls
into do_vmi_align_munmap() otherwise, which does the whole gather/complete
dance.
With respect to the mmap()'ing, actually we probably should always have VMA
write lock, because for any action to be taken, you couldn't merge since
VMA_SPECIAL_FLAGS would be specified (any kind of remap would be VMA_PFNMAP_BIT
+ friends, map kernel pages would be VMA_MIXEDMAP_BIT).
(Might be worth me adding an assert for that actually to avoid confusion.)
Not merging would mean __mmap_new_vma() would be called which naturally gets the
VMA write lock.
So you're right I think we should hold the VMA lock here, but I'm wondering if
it's much of a muchness since really we only _need_ the mmap write lock here.
>
> >
> > But I can convert this to an mmap_assert_write_locked()!
>
> Ok, let's go with that. I don't want to slow down your patchset while
> I investigate locking rules here. We can strengthen the assertion
> later.
Thanks!
>
> >
> > >
> > > > > > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > > > > > +}
> > > > > > +
> > > > > > #ifdef CONFIG_MMU
> > > > > >
> > > > > > static inline void get_anon_vma(struct anon_vma *anon_vma)
> > > > > > diff --git a/mm/util.c b/mm/util.c
> > > > > > index dba1191725b6..2b0ed54008d6 100644
> > > > > > --- a/mm/util.c
> > > > > > +++ b/mm/util.c
> > > > > > @@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
> > > > > > EXPORT_SYMBOL(flush_dcache_folio);
> > > > > > #endif
> > > > > >
> > > > > > +static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > > > > +{
> > > > > > + struct vm_area_desc desc = {
> > > > > > + .mm = vma->vm_mm,
> > > > > > + .file = file,
> > > > > > + .start = vma->vm_start,
> > > > > > + .end = vma->vm_end,
> > > > > > +
> > > > > > + .pgoff = vma->vm_pgoff,
> > > > > > + .vm_file = vma->vm_file,
> > > > > > + .vma_flags = vma->flags,
> > > > > > + .page_prot = vma->vm_page_prot,
> > > > > > +
> > > > > > + .action.type = MMAP_NOTHING, /* Default */
> > > > > > + };
> > > > > > + int err;
> > > > > > +
> > > > > > + err = vfs_mmap_prepare(file, &desc);
> > > > > > + if (err)
> > > > > > + return err;
> > > > > > +
> > > > > > + err = mmap_action_prepare(&desc, &desc.action);
> > > > > > + if (err)
> > > > > > + return err;
> > > > > > +
> > > > > > + set_vma_from_desc(vma, &desc);
> > > > > > + return mmap_action_complete(vma, &desc.action);
> > > > > > +}
> > > > > > +
> > > > > > +static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
> > > > > > +{
> > > > > > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > > > > > + void *vm_private_data = vma->vm_private_data;
> > > > > > + int err;
> > > > > > +
> > > > > > + if (!vm_ops->mapped)
> > > > > > + return 0;
> > > > > > +
> > > > >
> > > > > Hello!
> > > > >
> > > > > Can vm_ops be NULL here? __compat_vma_mapped() is called from
> > > > > compat_vma_mmap(), which is reached when a filesystem provides
> > > > > mmap_prepare. If the mmap_prepare hook does not set desc->vm_ops,
> > > > > vma->vm_ops will be NULL and this dereferences a NULL pointer.
> > > >
> > > > I _think_ for this to ever be invoked, you would need to be dealing with a
> > > > file-backed VMA so vm_ops->fault would HAVE to be defined.
> > > >
> > > > But you're right anyway as a matter of principle we should check it! Will fix.
> > > >
> > > > >
> > > > > For e.g. drivers/char/mem.c, mmap_zero_prepare() would trigger
> > > > > a NULL pointer dereference here.
> > > > >
> > > > > Would need to do
> > > > > if (!vm_ops || !vm_ops->mapped)
> > > > > return 0;
> > > > >
> > > > > here
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > >
> > > > > > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
> > > > > > + &vm_private_data);
> > > > > > + if (err)
> > > > > > + unmap_vma_locked(vma);
> > > > >
> > > > > when mapped() returns an error, unmap_vma_locked(vma) is called
> > > > > but execution continues into the vm_private_data update below. After
> > > > > unmap_vma_locked() the VMA may be freed (do_munmap can remove the VMA
> > > > > entirely), so accessing vma->vm_private_data after that is a
> > > > > use-after-free.
> > > >
> > > > Very good point :) will fix thanks!
> > > >
> > > > Probably:
> > > >
> > > > if (err)
> > > > unmap_vma_locked(vma);
> > > > else if (vm_private_data != vma->vm_private_data)
> > > > vma->vm_private_data = vm_private_data;
> > > >
> > > > return err;
> > > >
> > > > Would be fine.
> > > >
> > > > >
> > > > > Probably need to do:
> > > > > if (err) {
> > > > > unmap_vma_locked(vma);
> > > > > return err;
> > > > > }
> > > > >
> > > > > > + /* Update private data if changed. */
> > > > > > + if (vm_private_data != vma->vm_private_data)
> > > > > > + vma->vm_private_data = vm_private_data;
> > > > > > +
> > > > > > + return err;
> > > > > > +}
> > > > > > +
> > > > > > /**
> > > > > > * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> > > > > > * existing VMA and execute any requested actions.
> > > > > > @@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
> > > > > > */
> > > > > > int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > > > > > {
> > > > > > - struct vm_area_desc desc = {
> > > > > > - .mm = vma->vm_mm,
> > > > > > - .file = file,
> > > > > > - .start = vma->vm_start,
> > > > > > - .end = vma->vm_end,
> > > > > > -
> > > > > > - .pgoff = vma->vm_pgoff,
> > > > > > - .vm_file = vma->vm_file,
> > > > > > - .vma_flags = vma->flags,
> > > > > > - .page_prot = vma->vm_page_prot,
> > > > > > -
> > > > > > - .action.type = MMAP_NOTHING, /* Default */
> > > > > > - };
> > > > > > int err;
> > > > > >
> > > > > > - err = vfs_mmap_prepare(file, &desc);
> > > > > > - if (err)
> > > > > > - return err;
> > > > > > -
> > > > > > - err = mmap_action_prepare(&desc, &desc.action);
> > > > > > + err = __compat_vma_mmap(file, vma);
> > > > > > if (err)
> > > > > > return err;
> > > > > >
> > > > > > - set_vma_from_desc(vma, &desc);
> > > > > > - return mmap_action_complete(vma, &desc.action);
> > > > > > + return __compat_vma_mapped(file, vma);
> > > > > > }
> > > > > > EXPORT_SYMBOL(compat_vma_mmap);
> > > > > >
> > > > > > +int __vma_check_mmap_hook(struct vm_area_struct *vma)
> > > > > > +{
> > > > > > + /* vm_ops->mapped is not valid if mmap() is specified. */
> > > > > > + if (WARN_ON_ONCE(vma->vm_ops->mapped))
> > > > > > + return -EINVAL;
> > > > >
> > > > > I think vma->vm_ops can be NULL here. Should be:
> > > > >
> > > > > if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped))
> > > > > return -EINVAL;
> > > >
> > > > I think again you'd probably only invoke this on file-backed so be ok, but again
> > > > as a matter of principle we should check it so will fix, thanks!
> > > >
> > > > >
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(__vma_check_mmap_hook);
> > >
> > > nit: Any reason __vma_check_mmap_hook() is not inlined next to its
> > > user vfs_mmap()?
> >
> > Headers fun, fs.h is a 'before mm.h' header, so vm_operations_struct is not
> > declared yet here, so we can't actually do the check there.
>
> Ack.
>
> >
> > >
> > > > > > +
> > > > > > static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
> > > > > > const struct page *page)
> > > > > > {
> > > > > > @@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
> > > > > > * invoked if we do NOT merge, so we only clean up the VMA we created.
> > > > > > */
> > > > > > if (err) {
> > > > > > - const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > > > -
> > > > > > - do_munmap(current->mm, vma->vm_start, len, NULL);
> > > > > > -
> > > > > > + unmap_vma_locked(vma);
> > > > > > if (action->error_hook) {
> > > > > > /* We may want to filter the error. */
> > > > > > err = action->error_hook(err);
> > > > > > diff --git a/mm/vma.c b/mm/vma.c
> > > > > > index 054cf1d262fb..ef9f5a5365d1 100644
> > > > > > --- a/mm/vma.c
> > > > > > +++ b/mm/vma.c
> > > > > > @@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> > > > > > return false;
> > > > > > }
> > > > > >
> > > > > > -static int call_action_complete(struct mmap_state *map,
> > > > > > - struct mmap_action *action,
> > > > > > - struct vm_area_struct *vma)
> > > > > > +static int call_mapped_hook(struct vm_area_struct *vma)
> > > > > > {
> > > > > > - int ret;
> > > > > > + const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > > > > > + void *vm_private_data = vma->vm_private_data;
> > > > > > + int err;
> > > > > >
> > > > > > - ret = mmap_action_complete(vma, action);
> > > > > > + if (!vm_ops || !vm_ops->mapped)
> > > > > > + return 0;
> > > > > > + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
> > > > > > + vma->vm_file, &vm_private_data);
> > > > > > + if (err) {
> > > > > > + unmap_vma_locked(vma);
> > > > > > + return err;
> > > > > > + }
> > > > > > + /* Update private data if changed. */
> > > > > > + if (vm_private_data != vma->vm_private_data)
> > > > > > + vma->vm_private_data = vm_private_data;
> > > > > > + return 0;
> > > > > > +}
> > > > > >
> > > > > > - /* If we held the file rmap we need to release it. */
> > > > > > - if (map->hold_file_rmap_lock) {
> > > > > > - struct file *file = vma->vm_file;
> > > > > > +static void maybe_drop_file_rmap_lock(struct mmap_state *map,
> > > > > > + struct vm_area_struct *vma)
> > > > > > +{
> > > > > > + struct file *file;
> > > > > >
> > > > > > - i_mmap_unlock_write(file->f_mapping);
> > > > > > - }
> > > > > > - return ret;
> > > > > > + if (!map->hold_file_rmap_lock)
> > > > > > + return;
> > > > > > + file = vma->vm_file;
> > > > > > + i_mmap_unlock_write(file->f_mapping);
> > > > > > }
> > > > > >
> > > > > > static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > > > > @@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > > > > > __mmap_complete(&map, vma);
> > > > > >
> > > > > > if (have_mmap_prepare && allocated_new) {
> > > > > > - error = call_action_complete(&map, &desc.action, vma);
> > > > > > + error = mmap_action_complete(vma, &desc.action);
> > > > > > + if (!error)
> > > > > > + error = call_mapped_hook(vma);
> > > > > >
> > > > > > + maybe_drop_file_rmap_lock(&map, vma);
> > > > > > if (error)
> > > > > > return error;
> > > > > > }
> > > > > > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > > > > > index 908beb263307..47d8db809f31 100644
> > > > > > --- a/tools/testing/vma/include/dup.h
> > > > > > +++ b/tools/testing/vma/include/dup.h
> > > > > > @@ -606,12 +606,34 @@ struct vm_area_struct {
> > > > > > } __randomize_layout;
> > > > > >
> > > > > > struct vm_operations_struct {
> > > > > > - void (*open)(struct vm_area_struct * area);
> > > > > > + /**
> > > > > > + * @open: Called when a VMA is remapped or split. Not called upon first
> > > > > > + * mapping a VMA.
> > > > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > > + */
> > >
> > > This comment should have been introduced in the previous patch.
> >
> > It's the testing code, it's not really important. But if I respin I'll fix... :)
>
> Thanks!
>
> >
> > >
> > > > > > + void (*open)(struct vm_area_struct *vma);
> > > > > > /**
> > > > > > * @close: Called when the VMA is being removed from the MM.
> > > > > > * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > > */
> > > > > > - void (*close)(struct vm_area_struct * area);
> > > > > > + void (*close)(struct vm_area_struct *vma);
> > > > > > + /**
> > > > > > + * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > > > > > + * the new VMA is merged with an adjacent VMA.
> > > > > > + *
> > > > > > + * The @vm_private_data field is an output field allowing the user to
> > > > > > + * modify vma->vm_private_data as necessary.
> > > > > > + *
> > > > > > + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > > > > > + * set from f_op->mmap.
> > > > > > + *
> > > > > > + * Returns %0 on success, or an error otherwise. On error, the VMA will
> > > > > > + * be unmapped.
> > > > > > + *
> > > > > > + * Context: User context. May sleep. Caller holds mmap_lock.
> > > > > > + */
> > > > > > + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > > > + const struct file *file, void **vm_private_data);
> > > > > > /* Called any time before splitting to check if it's allowed */
> > > > > > int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> > > > > > int (*mremap)(struct vm_area_struct *area);
> > > > > > @@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
> > > > > > swap(vma->vm_file, file);
> > > > > > fput(file);
> > > > > > }
> > > > > > +
> > > > > > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > > > > > +{
> > > > > > + const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > > > +
> > > > > > + mmap_assert_locked(vma->vm_mm);
> > > > > > + do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > > > > > +}
> > > > > > --
> > > > > > 2.53.0
> > > > > >
> > > > > >
> > > >
> > > > Cheers, Lorenzo
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
2026-03-17 3:41 ` Suren Baghdasaryan
@ 2026-03-17 8:58 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 45+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-17 8:58 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
Alexandre Torgue, Miquel Raynal, Richard Weinberger,
Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Michal Hocko, Jann Horn, Pedro Falcato,
linux-kernel, linux-doc, linux-hyperv, linux-stm32,
linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
On Mon, Mar 16, 2026 at 08:41:48PM -0700, Suren Baghdasaryan wrote:
> On Mon, Mar 16, 2026 at 7:29 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > On Sun, Mar 15, 2026 at 07:32:54PM -0700, Suren Baghdasaryan wrote:
> > > On Fri, Mar 13, 2026 at 5:00 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > > >
> > > > On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> > > > > On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > > > >
> > > > > > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > > > > > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > > > > > the deprecated mmap callback.
> > > > > >
> > > > > > However, it did not account for the fact that mmap_prepare can fail to map
> > > > > > due to an out of memory error, and thus should not be incrementing a
> > > > > > reference count on mmap_prepare.
> > >
> > > This is a bit confusing. I see the current implementation does
> > > afs_add_open_mmap() and then if generic_file_mmap_prepare() fails it
> > > does afs_drop_open_mmap(), therefore refcounting seems to be balanced.
> > > Is there really a problem?
> >
> > Firstly, mmap_prepare is invoked before we try to merge, so the VMA could in
> > theory get merged and then the refcounting will be wrong.
>
> I see now. Ok, makes sense.
>
> >
> > Secondly, mmap_prepare occurs at such at time where it is _possible_ that
> > allocation failures as described below could happen.
>
> Right, but in that case afs_file_mmap_prepare() would drop its
> refcount and return an error, so refcounting is still good, no?
Nope, in __mmap_region():
call_mmap_prepare()
-> __mmap_new_vma()
vm_area_alloc() -> can fail
vma_iter_prealloc() -> can fail
__mmap_new_file_vma() / shmem_zero_setup() -> can fail
If any of those fail the VMA is not even set up, so no close() will be called
because there's no VMA to call close on.
This is what makes mmap_prepare very different from mmap which passes in (a
partially established) VMA.
That and of course a potential merge would mean any refcount increment would be
wrong.
>
> >
> > I'll update the commit message to reflect the merge aspect actually.
>
> Thanks!
You're welcome, and done in v2 :)
>
> >
> > >
> > > > > >
> > > > > > With the newly added vm_ops->mapped callback available, we can simply defer
> > > > > > this operation to that callback which is only invoked once the mapping is
> > > > > > successfully in place (but not yet visible to userspace as the mmap and VMA
> > > > > > write locks are held).
> > > > > >
> > > > > > Therefore add afs_mapped() to implement this callback for AFS.
> > > > > >
> > > > > > In practice the mapping allocations are 'too small to fail' so this is
> > > > > > something that realistically should never happen in practice (or would do
> > > > > > so in a case where the process is about to die anyway), but we should still
> > > > > > handle this.
> > >
> > > nit: I would drop the above paragraph. If it's impossible why are you
> > > handling it? If it's unlikely, then handling it is even more
> > > important.
> >
> > Sure I can drop it, but it's an ongoing thing with these small allocations.
> >
> > I wish we could just move to a scenario where we can simpy assume allocations
> > will always succeed :)
>
> That would be really nice but unfortunately the world is not that
> perfect. I just don't want to be chasing some rarely reproducible bug
> because of the assumption that an allocation is too small to fail.
I mean I agree, we should handle all error paths.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2026-03-17 8:58 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 20:27 [PATCH 00/15] mm: expand mmap_prepare functionality and usage Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 01/15] mm: various small mmap_prepare cleanups Lorenzo Stoakes (Oracle)
2026-03-12 21:14 ` Andrew Morton
2026-03-13 12:13 ` Lorenzo Stoakes (Oracle)
2026-03-15 22:56 ` Suren Baghdasaryan
2026-03-15 23:06 ` Suren Baghdasaryan
2026-03-16 14:47 ` Lorenzo Stoakes (Oracle)
2026-03-16 14:44 ` Lorenzo Stoakes (Oracle)
2026-03-16 21:27 ` Suren Baghdasaryan
2026-03-12 20:27 ` [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback Lorenzo Stoakes (Oracle)
2026-03-13 0:12 ` Randy Dunlap
2026-03-16 14:51 ` Lorenzo Stoakes (Oracle)
2026-03-15 23:23 ` Suren Baghdasaryan
2026-03-16 19:16 ` Lorenzo Stoakes (Oracle)
2026-03-16 22:59 ` Suren Baghdasaryan
2026-03-12 20:27 ` [PATCH 03/15] mm: document vm_operations_struct->open the same as close() Lorenzo Stoakes (Oracle)
2026-03-16 0:43 ` Suren Baghdasaryan
2026-03-16 14:31 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 04/15] mm: add vm_ops->mapped hook Lorenzo Stoakes (Oracle)
2026-03-13 11:02 ` Usama Arif
2026-03-13 11:58 ` Lorenzo Stoakes (Oracle)
2026-03-16 2:18 ` Suren Baghdasaryan
2026-03-16 13:39 ` Lorenzo Stoakes (Oracle)
2026-03-16 23:39 ` Suren Baghdasaryan
2026-03-17 8:42 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure Lorenzo Stoakes (Oracle)
2026-03-13 11:07 ` Usama Arif
2026-03-13 12:00 ` Lorenzo Stoakes (Oracle)
2026-03-16 2:32 ` Suren Baghdasaryan
2026-03-16 14:29 ` Lorenzo Stoakes (Oracle)
2026-03-17 3:41 ` Suren Baghdasaryan
2026-03-17 8:58 ` Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 06/15] mm: add mmap_action_simple_ioremap() Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 07/15] misc: open-dice: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 08/15] hpet: " Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 09/15] mtdchar: replace deprecated mmap hook with mmap_prepare, clean up Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 10/15] stm: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 11/15] staging: vme_user: " Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 12/15] mm: allow handling of stacked mmap_prepare hooks in more drivers Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 13/15] drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 14/15] uio: replace deprecated mmap hook with mmap_prepare in uio_info Lorenzo Stoakes (Oracle)
2026-03-12 20:27 ` [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]() Lorenzo Stoakes (Oracle)
2026-03-12 23:15 ` Randy Dunlap
2026-03-16 14:54 ` Lorenzo Stoakes (Oracle)
2026-03-12 21:23 ` [PATCH 00/15] mm: expand mmap_prepare functionality and usage Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox