* [PATCH v2 01/16] mm/shmem: update shmem to use mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
@ 2025-09-10 20:21 ` Lorenzo Stoakes
2025-09-11 8:32 ` Jan Kara
2025-09-10 20:21 ` [PATCH v2 02/16] device/dax: update devdax " Lorenzo Stoakes
` (15 subsequent siblings)
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
This simply assigns the vm_ops so is easily updated - do so.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/shmem.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 45e7733d6612..990e33c6a776 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2938,16 +2938,17 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
return retval;
}
-static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
+static int shmem_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *file = desc->file;
struct inode *inode = file_inode(file);
file_accessed(file);
/* This is anonymous shared memory if it is unlinked at the time of mmap */
if (inode->i_nlink)
- vma->vm_ops = &shmem_vm_ops;
+ desc->vm_ops = &shmem_vm_ops;
else
- vma->vm_ops = &shmem_anon_vm_ops;
+ desc->vm_ops = &shmem_anon_vm_ops;
return 0;
}
@@ -5217,7 +5218,7 @@ static const struct address_space_operations shmem_aops = {
};
static const struct file_operations shmem_file_operations = {
- .mmap = shmem_mmap,
+ .mmap_prepare = shmem_mmap_prepare,
.open = shmem_file_open,
.get_unmapped_area = shmem_get_unmapped_area,
#ifdef CONFIG_TMPFS
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 01/16] mm/shmem: update shmem to use mmap_prepare
2025-09-10 20:21 ` [PATCH v2 01/16] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
@ 2025-09-11 8:32 ` Jan Kara
0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2025-09-11 8:32 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Wed 10-09-25 21:21:56, Lorenzo Stoakes wrote:
> This simply assigns the vm_ops so is easily updated - do so.
>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> mm/shmem.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 45e7733d6612..990e33c6a776 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2938,16 +2938,17 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
> return retval;
> }
>
> -static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
> +static int shmem_mmap_prepare(struct vm_area_desc *desc)
> {
> + struct file *file = desc->file;
> struct inode *inode = file_inode(file);
>
> file_accessed(file);
> /* This is anonymous shared memory if it is unlinked at the time of mmap */
> if (inode->i_nlink)
> - vma->vm_ops = &shmem_vm_ops;
> + desc->vm_ops = &shmem_vm_ops;
> else
> - vma->vm_ops = &shmem_anon_vm_ops;
> + desc->vm_ops = &shmem_anon_vm_ops;
> return 0;
> }
>
> @@ -5217,7 +5218,7 @@ static const struct address_space_operations shmem_aops = {
> };
>
> static const struct file_operations shmem_file_operations = {
> - .mmap = shmem_mmap,
> + .mmap_prepare = shmem_mmap_prepare,
> .open = shmem_file_open,
> .get_unmapped_area = shmem_get_unmapped_area,
> #ifdef CONFIG_TMPFS
> --
> 2.51.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 02/16] device/dax: update devdax to use mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
2025-09-10 20:21 ` [PATCH v2 01/16] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
@ 2025-09-10 20:21 ` Lorenzo Stoakes
2025-09-11 8:35 ` Jan Kara
2025-09-10 20:21 ` [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
` (14 subsequent siblings)
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
The devdax driver does nothing special in its f_op->mmap hook, so
straightforwardly update it to use the mmap_prepare hook instead.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
drivers/dax/device.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 2bb40a6060af..c2181439f925 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -13,8 +13,9 @@
#include "dax-private.h"
#include "bus.h"
-static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
- const char *func)
+static int __check_vma(struct dev_dax *dev_dax, vm_flags_t vm_flags,
+ unsigned long start, unsigned long end, struct file *file,
+ const char *func)
{
struct device *dev = &dev_dax->dev;
unsigned long mask;
@@ -23,7 +24,7 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
return -ENXIO;
/* prevent private mappings from being established */
- if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
+ if ((vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
dev_info_ratelimited(dev,
"%s: %s: fail, attempted private mapping\n",
current->comm, func);
@@ -31,15 +32,15 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
}
mask = dev_dax->align - 1;
- if (vma->vm_start & mask || vma->vm_end & mask) {
+ if (start & mask || end & mask) {
dev_info_ratelimited(dev,
"%s: %s: fail, unaligned vma (%#lx - %#lx, %#lx)\n",
- current->comm, func, vma->vm_start, vma->vm_end,
+ current->comm, func, start, end,
mask);
return -EINVAL;
}
- if (!vma_is_dax(vma)) {
+ if (!file_is_dax(file)) {
dev_info_ratelimited(dev,
"%s: %s: fail, vma is not DAX capable\n",
current->comm, func);
@@ -49,6 +50,13 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
return 0;
}
+static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
+ const char *func)
+{
+ return __check_vma(dev_dax, vma->vm_flags, vma->vm_start, vma->vm_end,
+ vma->vm_file, func);
+}
+
/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */
__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
unsigned long size)
@@ -285,8 +293,9 @@ static const struct vm_operations_struct dax_vm_ops = {
.pagesize = dev_dax_pagesize,
};
-static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
+static int dax_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *filp = desc->file;
struct dev_dax *dev_dax = filp->private_data;
int rc, id;
@@ -297,13 +306,14 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
* fault time.
*/
id = dax_read_lock();
- rc = check_vma(dev_dax, vma, __func__);
+ rc = __check_vma(dev_dax, desc->vm_flags, desc->start, desc->end, filp,
+ __func__);
dax_read_unlock(id);
if (rc)
return rc;
- vma->vm_ops = &dax_vm_ops;
- vm_flags_set(vma, VM_HUGEPAGE);
+ desc->vm_ops = &dax_vm_ops;
+ desc->vm_flags |= VM_HUGEPAGE;
return 0;
}
@@ -377,7 +387,7 @@ static const struct file_operations dax_fops = {
.open = dax_open,
.release = dax_release,
.get_unmapped_area = dax_get_unmapped_area,
- .mmap = dax_mmap,
+ .mmap_prepare = dax_mmap_prepare,
.fop_flags = FOP_MMAP_SYNC,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 02/16] device/dax: update devdax to use mmap_prepare
2025-09-10 20:21 ` [PATCH v2 02/16] device/dax: update devdax " Lorenzo Stoakes
@ 2025-09-11 8:35 ` Jan Kara
0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2025-09-11 8:35 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Wed 10-09-25 21:21:57, Lorenzo Stoakes wrote:
> The devdax driver does nothing special in its f_op->mmap hook, so
> straightforwardly update it to use the mmap_prepare hook instead.
>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> drivers/dax/device.c | 32 +++++++++++++++++++++-----------
> 1 file changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index 2bb40a6060af..c2181439f925 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -13,8 +13,9 @@
> #include "dax-private.h"
> #include "bus.h"
>
> -static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
> - const char *func)
> +static int __check_vma(struct dev_dax *dev_dax, vm_flags_t vm_flags,
> + unsigned long start, unsigned long end, struct file *file,
> + const char *func)
> {
> struct device *dev = &dev_dax->dev;
> unsigned long mask;
> @@ -23,7 +24,7 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
> return -ENXIO;
>
> /* prevent private mappings from being established */
> - if ((vma->vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
> + if ((vm_flags & VM_MAYSHARE) != VM_MAYSHARE) {
> dev_info_ratelimited(dev,
> "%s: %s: fail, attempted private mapping\n",
> current->comm, func);
> @@ -31,15 +32,15 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
> }
>
> mask = dev_dax->align - 1;
> - if (vma->vm_start & mask || vma->vm_end & mask) {
> + if (start & mask || end & mask) {
> dev_info_ratelimited(dev,
> "%s: %s: fail, unaligned vma (%#lx - %#lx, %#lx)\n",
> - current->comm, func, vma->vm_start, vma->vm_end,
> + current->comm, func, start, end,
> mask);
> return -EINVAL;
> }
>
> - if (!vma_is_dax(vma)) {
> + if (!file_is_dax(file)) {
> dev_info_ratelimited(dev,
> "%s: %s: fail, vma is not DAX capable\n",
> current->comm, func);
> @@ -49,6 +50,13 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
> return 0;
> }
>
> +static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma,
> + const char *func)
> +{
> + return __check_vma(dev_dax, vma->vm_flags, vma->vm_start, vma->vm_end,
> + vma->vm_file, func);
> +}
> +
> /* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */
> __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
> unsigned long size)
> @@ -285,8 +293,9 @@ static const struct vm_operations_struct dax_vm_ops = {
> .pagesize = dev_dax_pagesize,
> };
>
> -static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
> +static int dax_mmap_prepare(struct vm_area_desc *desc)
> {
> + struct file *filp = desc->file;
> struct dev_dax *dev_dax = filp->private_data;
> int rc, id;
>
> @@ -297,13 +306,14 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
> * fault time.
> */
> id = dax_read_lock();
> - rc = check_vma(dev_dax, vma, __func__);
> + rc = __check_vma(dev_dax, desc->vm_flags, desc->start, desc->end, filp,
> + __func__);
> dax_read_unlock(id);
> if (rc)
> return rc;
>
> - vma->vm_ops = &dax_vm_ops;
> - vm_flags_set(vma, VM_HUGEPAGE);
> + desc->vm_ops = &dax_vm_ops;
> + desc->vm_flags |= VM_HUGEPAGE;
> return 0;
> }
>
> @@ -377,7 +387,7 @@ static const struct file_operations dax_fops = {
> .open = dax_open,
> .release = dax_release,
> .get_unmapped_area = dax_get_unmapped_area,
> - .mmap = dax_mmap,
> + .mmap_prepare = dax_mmap_prepare,
> .fop_flags = FOP_MMAP_SYNC,
> };
>
> --
> 2.51.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
2025-09-10 20:21 ` [PATCH v2 01/16] mm/shmem: update shmem to use mmap_prepare Lorenzo Stoakes
2025-09-10 20:21 ` [PATCH v2 02/16] device/dax: update devdax " Lorenzo Stoakes
@ 2025-09-10 20:21 ` Lorenzo Stoakes
2025-09-11 8:36 ` Jan Kara
2025-09-12 17:56 ` David Hildenbrand
2025-09-10 20:21 ` [PATCH v2 04/16] relay: update relay to use mmap_prepare Lorenzo Stoakes
` (13 subsequent siblings)
16 siblings, 2 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
It's useful to be able to determine the size of a VMA descriptor range used
on f_op->mmap_prepare, expressed both in bytes and pages, so add helpers
for both and update code that could make use of it to do so.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
fs/ntfs3/file.c | 2 +-
include/linux/mm.h | 10 ++++++++++
mm/secretmem.c | 2 +-
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index c1ece707b195..86eb88f62714 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -304,7 +304,7 @@ static int ntfs_file_mmap_prepare(struct vm_area_desc *desc)
if (rw) {
u64 to = min_t(loff_t, i_size_read(inode),
- from + desc->end - desc->start);
+ from + vma_desc_size(desc));
if (is_sparsed(ni)) {
/* Allocate clusters for rw map. */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 892fe5dbf9de..0b97589aec6d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3572,6 +3572,16 @@ static inline unsigned long vma_pages(const struct vm_area_struct *vma)
return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
}
+static inline unsigned long vma_desc_size(struct vm_area_desc *desc)
+{
+ return desc->end - desc->start;
+}
+
+static inline unsigned long vma_desc_pages(struct vm_area_desc *desc)
+{
+ return vma_desc_size(desc) >> PAGE_SHIFT;
+}
+
/* Look up the first VMA which exactly match the interval vm_start ... vm_end */
static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
unsigned long vm_start, unsigned long vm_end)
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 60137305bc20..62066ddb1e9c 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -120,7 +120,7 @@ static int secretmem_release(struct inode *inode, struct file *file)
static int secretmem_mmap_prepare(struct vm_area_desc *desc)
{
- const unsigned long len = desc->end - desc->start;
+ const unsigned long len = vma_desc_size(desc);
if ((desc->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
return -EINVAL;
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers
2025-09-10 20:21 ` [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
@ 2025-09-11 8:36 ` Jan Kara
2025-09-12 17:56 ` David Hildenbrand
1 sibling, 0 replies; 55+ messages in thread
From: Jan Kara @ 2025-09-11 8:36 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Wed 10-09-25 21:21:58, Lorenzo Stoakes wrote:
> It's useful to be able to determine the size of a VMA descriptor range used
> on f_op->mmap_prepare, expressed both in bytes and pages, so add helpers
> for both and update code that could make use of it to do so.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Looks good, I presume more users will come later in the series :). Feel
free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ntfs3/file.c | 2 +-
> include/linux/mm.h | 10 ++++++++++
> mm/secretmem.c | 2 +-
> 3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
> index c1ece707b195..86eb88f62714 100644
> --- a/fs/ntfs3/file.c
> +++ b/fs/ntfs3/file.c
> @@ -304,7 +304,7 @@ static int ntfs_file_mmap_prepare(struct vm_area_desc *desc)
>
> if (rw) {
> u64 to = min_t(loff_t, i_size_read(inode),
> - from + desc->end - desc->start);
> + from + vma_desc_size(desc));
>
> if (is_sparsed(ni)) {
> /* Allocate clusters for rw map. */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 892fe5dbf9de..0b97589aec6d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3572,6 +3572,16 @@ static inline unsigned long vma_pages(const struct vm_area_struct *vma)
> return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> }
>
> +static inline unsigned long vma_desc_size(struct vm_area_desc *desc)
> +{
> + return desc->end - desc->start;
> +}
> +
> +static inline unsigned long vma_desc_pages(struct vm_area_desc *desc)
> +{
> + return vma_desc_size(desc) >> PAGE_SHIFT;
> +}
> +
> /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
> static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
> unsigned long vm_start, unsigned long vm_end)
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 60137305bc20..62066ddb1e9c 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -120,7 +120,7 @@ static int secretmem_release(struct inode *inode, struct file *file)
>
> static int secretmem_mmap_prepare(struct vm_area_desc *desc)
> {
> - const unsigned long len = desc->end - desc->start;
> + const unsigned long len = vma_desc_size(desc);
>
> if ((desc->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
> return -EINVAL;
> --
> 2.51.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers
2025-09-10 20:21 ` [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
2025-09-11 8:36 ` Jan Kara
@ 2025-09-12 17:56 ` David Hildenbrand
2025-09-15 10:12 ` Lorenzo Stoakes
1 sibling, 1 reply; 55+ messages in thread
From: David Hildenbrand @ 2025-09-12 17:56 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, Konstantin Komarov, Baoquan He, Vivek Goyal,
Dave Young, Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On 10.09.25 22:21, Lorenzo Stoakes wrote:
> It's useful to be able to determine the size of a VMA descriptor range used
> on f_op->mmap_prepare, expressed both in bytes and pages, so add helpers
> for both and update code that could make use of it to do so.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> fs/ntfs3/file.c | 2 +-
> include/linux/mm.h | 10 ++++++++++
> mm/secretmem.c | 2 +-
> 3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
> index c1ece707b195..86eb88f62714 100644
> --- a/fs/ntfs3/file.c
> +++ b/fs/ntfs3/file.c
> @@ -304,7 +304,7 @@ static int ntfs_file_mmap_prepare(struct vm_area_desc *desc)
>
> if (rw) {
> u64 to = min_t(loff_t, i_size_read(inode),
> - from + desc->end - desc->start);
> + from + vma_desc_size(desc));
>
> if (is_sparsed(ni)) {
> /* Allocate clusters for rw map. */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 892fe5dbf9de..0b97589aec6d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3572,6 +3572,16 @@ static inline unsigned long vma_pages(const struct vm_area_struct *vma)
> return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> }
>
> +static inline unsigned long vma_desc_size(struct vm_area_desc *desc)
> +{
> + return desc->end - desc->start;
> +}
> +
> +static inline unsigned long vma_desc_pages(struct vm_area_desc *desc)
> +{
> + return vma_desc_size(desc) >> PAGE_SHIFT;
> +}
Should parameters in both functions be const * ?
> +
> /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
> static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
> unsigned long vm_start, unsigned long vm_end)
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 60137305bc20..62066ddb1e9c 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -120,7 +120,7 @@ static int secretmem_release(struct inode *inode, struct file *file)
>
> static int secretmem_mmap_prepare(struct vm_area_desc *desc)
> {
> - const unsigned long len = desc->end - desc->start;
> + const unsigned long len = vma_desc_size(desc);
>
> if ((desc->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
> return -EINVAL;
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers
2025-09-12 17:56 ` David Hildenbrand
@ 2025-09-15 10:12 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 10:12 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, Konstantin Komarov,
Baoquan He, Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre,
Dave Martin, James Morse, Alexander Viro, Christian Brauner,
Jan Kara, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
On Fri, Sep 12, 2025 at 07:56:46PM +0200, David Hildenbrand wrote:
> On 10.09.25 22:21, Lorenzo Stoakes wrote:
> > It's useful to be able to determine the size of a VMA descriptor range used
> > on f_op->mmap_prepare, expressed both in bytes and pages, so add helpers
> > for both and update code that could make use of it to do so.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > fs/ntfs3/file.c | 2 +-
> > include/linux/mm.h | 10 ++++++++++
> > mm/secretmem.c | 2 +-
> > 3 files changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
> > index c1ece707b195..86eb88f62714 100644
> > --- a/fs/ntfs3/file.c
> > +++ b/fs/ntfs3/file.c
> > @@ -304,7 +304,7 @@ static int ntfs_file_mmap_prepare(struct vm_area_desc *desc)
> > if (rw) {
> > u64 to = min_t(loff_t, i_size_read(inode),
> > - from + desc->end - desc->start);
> > + from + vma_desc_size(desc));
> > if (is_sparsed(ni)) {
> > /* Allocate clusters for rw map. */
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 892fe5dbf9de..0b97589aec6d 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3572,6 +3572,16 @@ static inline unsigned long vma_pages(const struct vm_area_struct *vma)
> > return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> > }
> > +static inline unsigned long vma_desc_size(struct vm_area_desc *desc)
> > +{
> > + return desc->end - desc->start;
> > +}
> > +
> > +static inline unsigned long vma_desc_pages(struct vm_area_desc *desc)
> > +{
> > + return vma_desc_size(desc) >> PAGE_SHIFT;
> > +}
>
> Should parameters in both functions be const * ?
Can do, will fix up if respin.
>
> > +
> > /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
> > static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
> > unsigned long vm_start, unsigned long vm_end)
> > diff --git a/mm/secretmem.c b/mm/secretmem.c
> > index 60137305bc20..62066ddb1e9c 100644
> > --- a/mm/secretmem.c
> > +++ b/mm/secretmem.c
> > @@ -120,7 +120,7 @@ static int secretmem_release(struct inode *inode, struct file *file)
> > static int secretmem_mmap_prepare(struct vm_area_desc *desc)
> > {
> > - const unsigned long len = desc->end - desc->start;
> > + const unsigned long len = vma_desc_size(desc);
> > if ((desc->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
> > return -EINVAL;
>
> Acked-by: David Hildenbrand <david@redhat.com>
Thanks!
>
> --
> Cheers
>
> David / dhildenb
>
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 04/16] relay: update relay to use mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (2 preceding siblings ...)
2025-09-10 20:21 ` [PATCH v2 03/16] mm: add vma_desc_size(), vma_desc_pages() helpers Lorenzo Stoakes
@ 2025-09-10 20:21 ` Lorenzo Stoakes
2025-09-11 8:38 ` Jan Kara
2025-09-10 20:22 ` [PATCH v2 05/16] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
` (12 subsequent siblings)
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
It is relatively trivial to update this code to use the f_op->mmap_prepare
hook in favour of the deprecated f_op->mmap hook, so do so.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
kernel/relay.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/kernel/relay.c b/kernel/relay.c
index 8d915fe98198..e36f6b926f7f 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -72,17 +72,18 @@ static void relay_free_page_array(struct page **array)
}
/**
- * relay_mmap_buf: - mmap channel buffer to process address space
- * @buf: relay channel buffer
- * @vma: vm_area_struct describing memory to be mapped
+ * relay_mmap_prepare_buf: - mmap channel buffer to process address space
+ * @buf: the relay channel buffer
+ * @desc: describing what to map
*
* Returns 0 if ok, negative on error
*
* Caller should already have grabbed mmap_lock.
*/
-static int relay_mmap_buf(struct rchan_buf *buf, struct vm_area_struct *vma)
+static int relay_mmap_prepare_buf(struct rchan_buf *buf,
+ struct vm_area_desc *desc)
{
- unsigned long length = vma->vm_end - vma->vm_start;
+ unsigned long length = vma_desc_size(desc);
if (!buf)
return -EBADF;
@@ -90,9 +91,9 @@ static int relay_mmap_buf(struct rchan_buf *buf, struct vm_area_struct *vma)
if (length != (unsigned long)buf->chan->alloc_size)
return -EINVAL;
- vma->vm_ops = &relay_file_mmap_ops;
- vm_flags_set(vma, VM_DONTEXPAND);
- vma->vm_private_data = buf;
+ desc->vm_ops = &relay_file_mmap_ops;
+ desc->vm_flags |= VM_DONTEXPAND;
+ desc->private_data = buf;
return 0;
}
@@ -749,16 +750,16 @@ static int relay_file_open(struct inode *inode, struct file *filp)
}
/**
- * relay_file_mmap - mmap file op for relay files
- * @filp: the file
- * @vma: the vma describing what to map
+ * relay_file_mmap_prepare - mmap file op for relay files
+ * @desc: describing what to map
*
- * Calls upon relay_mmap_buf() to map the file into user space.
+ * Calls upon relay_mmap_prepare_buf() to map the file into user space.
*/
-static int relay_file_mmap(struct file *filp, struct vm_area_struct *vma)
+static int relay_file_mmap_prepare(struct vm_area_desc *desc)
{
- struct rchan_buf *buf = filp->private_data;
- return relay_mmap_buf(buf, vma);
+ struct rchan_buf *buf = desc->file->private_data;
+
+ return relay_mmap_prepare_buf(buf, desc);
}
/**
@@ -1006,7 +1007,7 @@ static ssize_t relay_file_read(struct file *filp,
const struct file_operations relay_file_operations = {
.open = relay_file_open,
.poll = relay_file_poll,
- .mmap = relay_file_mmap,
+ .mmap_prepare = relay_file_mmap_prepare,
.read = relay_file_read,
.release = relay_file_release,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 04/16] relay: update relay to use mmap_prepare
2025-09-10 20:21 ` [PATCH v2 04/16] relay: update relay to use mmap_prepare Lorenzo Stoakes
@ 2025-09-11 8:38 ` Jan Kara
0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2025-09-11 8:38 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Wed 10-09-25 21:21:59, Lorenzo Stoakes wrote:
> It is relatively trivial to update this code to use the f_op->mmap_prepare
> hook in favour of the deprecated f_op->mmap hook, so do so.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> kernel/relay.c | 33 +++++++++++++++++----------------
> 1 file changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/relay.c b/kernel/relay.c
> index 8d915fe98198..e36f6b926f7f 100644
> --- a/kernel/relay.c
> +++ b/kernel/relay.c
> @@ -72,17 +72,18 @@ static void relay_free_page_array(struct page **array)
> }
>
> /**
> - * relay_mmap_buf: - mmap channel buffer to process address space
> - * @buf: relay channel buffer
> - * @vma: vm_area_struct describing memory to be mapped
> + * relay_mmap_prepare_buf: - mmap channel buffer to process address space
> + * @buf: the relay channel buffer
> + * @desc: describing what to map
> *
> * Returns 0 if ok, negative on error
> *
> * Caller should already have grabbed mmap_lock.
> */
> -static int relay_mmap_buf(struct rchan_buf *buf, struct vm_area_struct *vma)
> +static int relay_mmap_prepare_buf(struct rchan_buf *buf,
> + struct vm_area_desc *desc)
> {
> - unsigned long length = vma->vm_end - vma->vm_start;
> + unsigned long length = vma_desc_size(desc);
>
> if (!buf)
> return -EBADF;
> @@ -90,9 +91,9 @@ static int relay_mmap_buf(struct rchan_buf *buf, struct vm_area_struct *vma)
> if (length != (unsigned long)buf->chan->alloc_size)
> return -EINVAL;
>
> - vma->vm_ops = &relay_file_mmap_ops;
> - vm_flags_set(vma, VM_DONTEXPAND);
> - vma->vm_private_data = buf;
> + desc->vm_ops = &relay_file_mmap_ops;
> + desc->vm_flags |= VM_DONTEXPAND;
> + desc->private_data = buf;
>
> return 0;
> }
> @@ -749,16 +750,16 @@ static int relay_file_open(struct inode *inode, struct file *filp)
> }
>
> /**
> - * relay_file_mmap - mmap file op for relay files
> - * @filp: the file
> - * @vma: the vma describing what to map
> + * relay_file_mmap_prepare - mmap file op for relay files
> + * @desc: describing what to map
> *
> - * Calls upon relay_mmap_buf() to map the file into user space.
> + * Calls upon relay_mmap_prepare_buf() to map the file into user space.
> */
> -static int relay_file_mmap(struct file *filp, struct vm_area_struct *vma)
> +static int relay_file_mmap_prepare(struct vm_area_desc *desc)
> {
> - struct rchan_buf *buf = filp->private_data;
> - return relay_mmap_buf(buf, vma);
> + struct rchan_buf *buf = desc->file->private_data;
> +
> + return relay_mmap_prepare_buf(buf, desc);
> }
>
> /**
> @@ -1006,7 +1007,7 @@ static ssize_t relay_file_read(struct file *filp,
> const struct file_operations relay_file_operations = {
> .open = relay_file_open,
> .poll = relay_file_poll,
> - .mmap = relay_file_mmap,
> + .mmap_prepare = relay_file_mmap_prepare,
> .read = relay_file_read,
> .release = relay_file_release,
> };
> --
> 2.51.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 05/16] mm/vma: rename __mmap_prepare() function to avoid confusion
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (3 preceding siblings ...)
2025-09-10 20:21 ` [PATCH v2 04/16] relay: update relay to use mmap_prepare Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-12 17:57 ` David Hildenbrand
2025-09-10 20:22 ` [PATCH v2 06/16] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
` (11 subsequent siblings)
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Now we have the f_op->mmap_prepare() hook, having a static function called
__mmap_prepare() that has nothing to do with it is confusing, so rename the
function to __mmap_setup().
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/vma.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index abe0da33c844..36a9f4d453be 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2329,7 +2329,7 @@ static void update_ksm_flags(struct mmap_state *map)
}
/*
- * __mmap_prepare() - Prepare to gather any overlapping VMAs that need to be
+ * __mmap_setup() - Prepare to gather any overlapping VMAs that need to be
* unmapped once the map operation is completed, check limits, account mapping
* and clean up any pre-existing VMAs.
*
@@ -2338,7 +2338,7 @@ static void update_ksm_flags(struct mmap_state *map)
*
* Returns: 0 on success, error code otherwise.
*/
-static int __mmap_prepare(struct mmap_state *map, struct list_head *uf)
+static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
{
int error;
struct vma_iterator *vmi = map->vmi;
@@ -2649,7 +2649,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
map.check_ksm_early = can_set_ksm_flags_early(&map);
- error = __mmap_prepare(&map, uf);
+ error = __mmap_setup(&map, uf);
if (!error && have_mmap_prepare)
error = call_mmap_prepare(&map);
if (error)
@@ -2679,7 +2679,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
return addr;
- /* Accounting was done by __mmap_prepare(). */
+ /* Accounting was done by __mmap_setup(). */
unacct_error:
if (map.charged)
vm_unacct_memory(map.charged);
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 05/16] mm/vma: rename __mmap_prepare() function to avoid confusion
2025-09-10 20:22 ` [PATCH v2 05/16] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
@ 2025-09-12 17:57 ` David Hildenbrand
2025-09-15 10:12 ` Lorenzo Stoakes
0 siblings, 1 reply; 55+ messages in thread
From: David Hildenbrand @ 2025-09-12 17:57 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, Konstantin Komarov, Baoquan He, Vivek Goyal,
Dave Young, Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On 10.09.25 22:22, Lorenzo Stoakes wrote:
> Now we have the f_op->mmap_prepare() hook, having a static function called
> __mmap_prepare() that has nothing to do with it is confusing, so rename the
> function to __mmap_setup().
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 05/16] mm/vma: rename __mmap_prepare() function to avoid confusion
2025-09-12 17:57 ` David Hildenbrand
@ 2025-09-15 10:12 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 10:12 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, Konstantin Komarov,
Baoquan He, Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre,
Dave Martin, James Morse, Alexander Viro, Christian Brauner,
Jan Kara, Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
On Fri, Sep 12, 2025 at 07:57:20PM +0200, David Hildenbrand wrote:
> On 10.09.25 22:22, Lorenzo Stoakes wrote:
> > Now we have the f_op->mmap_prepare() hook, having a static function called
> > __mmap_prepare() that has nothing to do with it is confusing, so rename the
> > function to __mmap_setup().
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
Thanks!
>
> --
> Cheers
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 06/16] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (4 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 05/16] mm/vma: rename __mmap_prepare() function to avoid confusion Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 07/16] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
` (10 subsequent siblings)
16 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
We need the ability to split PFN remap between updating the VMA and
performing the actual remap, in order to do away with the legacy f_op->mmap
hook.
To do so, update the PFN remap code to provide shared logic, and also make
remap_pfn_range_notrack() static, as its one user, io_mapping_map_user()
was removed in commit 9a4f90e24661 ("mm: remove mm/io-mapping.c").
Then, introduce remap_pfn_range_prepare(), which accepts VMA descriptor and
PFN parameters, and remap_pfn_range_complete() which accepts the same
parameters as remap_pfn_rangte().
remap_pfn_range_prepare() will set the cow vma->vm_pgoff if necessary, so
it must be supplied with a correct PFN to do so. If the caller must hold
locks to be able to do this, those locks should be held across the
operation, and mmap_abort() should be provided to revoke the lock should an
error arise.
While we're here, also clean up the duplicated #ifdef
__HAVE_PFNMAP_TRACKING check and put into a single #ifdef/#else block.
We would prefer to define these functions in mm/internal.h, however we will
do the same for io_remap*() and these have arch defines that require access
to the remap functions.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mm.h | 25 +++++++--
mm/memory.c | 128 ++++++++++++++++++++++++++++-----------------
2 files changed, 102 insertions(+), 51 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0b97589aec6d..0e256823799d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -489,6 +489,21 @@ extern unsigned int kobjsize(const void *objp);
*/
#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)
+/*
+ * Physically remapped pages are special. Tell the
+ * rest of the world about it:
+ * VM_IO tells people not to look at these pages
+ * (accesses can have side effects).
+ * VM_PFNMAP tells the core MM that the base pages are just
+ * raw PFN mappings, and do not have a "struct page" associated
+ * with them.
+ * VM_DONTEXPAND
+ * Disable vma merging and expanding with mremap().
+ * VM_DONTDUMP
+ * Omit vma from core dump, even when VM_IO turned off.
+ */
+#define VM_REMAP_FLAGS (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP)
+
/* This mask prevents VMA from being scanned with khugepaged */
#define VM_NO_KHUGEPAGED (VM_SPECIAL | VM_HUGETLB)
@@ -3623,10 +3638,12 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
struct vm_area_struct *find_extend_vma_locked(struct mm_struct *,
unsigned long addr);
-int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t);
-int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot);
+int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot);
+void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
+int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot);
+
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
struct page **pages, unsigned long *num);
diff --git a/mm/memory.c b/mm/memory.c
index 3e0404bd57a0..5c4d5261996d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2903,8 +2903,27 @@ static inline int remap_p4d_range(struct mm_struct *mm, pgd_t *pgd,
return 0;
}
+static int get_remap_pgoff(vm_flags_t vm_flags, unsigned long addr,
+ unsigned long end, unsigned long vm_start, unsigned long vm_end,
+ unsigned long pfn, pgoff_t *vm_pgoff_p)
+{
+ /*
+ * There's a horrible special case to handle copy-on-write
+ * behaviour that some programs depend on. We mark the "original"
+ * un-COW'ed pages by matching them up with "vma->vm_pgoff".
+ * See vm_normal_page() for details.
+ */
+ if (is_cow_mapping(vm_flags)) {
+ if (addr != vm_start || end != vm_end)
+ return -EINVAL;
+ *vm_pgoff_p = pfn;
+ }
+
+ return 0;
+}
+
static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+ unsigned long pfn, unsigned long size, pgprot_t prot, bool set_vma)
{
pgd_t *pgd;
unsigned long next;
@@ -2915,32 +2934,17 @@ static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long ad
if (WARN_ON_ONCE(!PAGE_ALIGNED(addr)))
return -EINVAL;
- /*
- * Physically remapped pages are special. Tell the
- * rest of the world about it:
- * VM_IO tells people not to look at these pages
- * (accesses can have side effects).
- * VM_PFNMAP tells the core MM that the base pages are just
- * raw PFN mappings, and do not have a "struct page" associated
- * with them.
- * VM_DONTEXPAND
- * Disable vma merging and expanding with mremap().
- * VM_DONTDUMP
- * Omit vma from core dump, even when VM_IO turned off.
- *
- * There's a horrible special case to handle copy-on-write
- * behaviour that some programs depend on. We mark the "original"
- * un-COW'ed pages by matching them up with "vma->vm_pgoff".
- * See vm_normal_page() for details.
- */
- if (is_cow_mapping(vma->vm_flags)) {
- if (addr != vma->vm_start || end != vma->vm_end)
- return -EINVAL;
- vma->vm_pgoff = pfn;
+ if (set_vma) {
+ err = get_remap_pgoff(vma->vm_flags, addr, end,
+ vma->vm_start, vma->vm_end,
+ pfn, &vma->vm_pgoff);
+ if (err)
+ return err;
+ vm_flags_set(vma, VM_REMAP_FLAGS);
+ } else {
+ VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) != VM_REMAP_FLAGS);
}
- vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
-
BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;
pgd = pgd_offset(mm, addr);
@@ -2960,11 +2964,10 @@ static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long ad
* Variant of remap_pfn_range that does not call track_pfn_remap. The caller
* must have pre-validated the caching bits of the pgprot_t.
*/
-int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+static int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot, bool set_vma)
{
- int error = remap_pfn_range_internal(vma, addr, pfn, size, prot);
-
+ int error = remap_pfn_range_internal(vma, addr, pfn, size, prot, set_vma);
if (!error)
return 0;
@@ -2977,6 +2980,18 @@ int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
return error;
}
+void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
+{
+ /*
+ * We set addr=VMA start, end=VMA end here, so this won't fail, but we
+ * check it again on complete and will fail there if specified addr is
+ * invalid.
+ */
+ get_remap_pgoff(desc->vm_flags, desc->start, desc->end,
+ desc->start, desc->end, pfn, &desc->pgoff);
+ desc->vm_flags |= VM_REMAP_FLAGS;
+}
+
#ifdef __HAVE_PFNMAP_TRACKING
static inline struct pfnmap_track_ctx *pfnmap_track_ctx_alloc(unsigned long pfn,
unsigned long size, pgprot_t *prot)
@@ -3005,23 +3020,9 @@ void pfnmap_track_ctx_release(struct kref *ref)
pfnmap_untrack(ctx->pfn, ctx->size);
kfree(ctx);
}
-#endif /* __HAVE_PFNMAP_TRACKING */
-/**
- * remap_pfn_range - remap kernel memory to userspace
- * @vma: user vma to map to
- * @addr: target page aligned user address to start at
- * @pfn: page frame number of kernel physical memory address
- * @size: size of mapping area
- * @prot: page protection flags for this mapping
- *
- * Note: this is only safe if the mm semaphore is held when called.
- *
- * Return: %0 on success, negative error code otherwise.
- */
-#ifdef __HAVE_PFNMAP_TRACKING
-int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+static int remap_pfn_range_track(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot, bool set_vma)
{
struct pfnmap_track_ctx *ctx = NULL;
int err;
@@ -3047,7 +3048,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
return -EINVAL;
}
- err = remap_pfn_range_notrack(vma, addr, pfn, size, prot);
+ err = remap_pfn_range_notrack(vma, addr, pfn, size, prot, set_vma);
if (ctx) {
if (err)
kref_put(&ctx->kref, pfnmap_track_ctx_release);
@@ -3057,11 +3058,44 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
return err;
}
+/**
+ * remap_pfn_range - remap kernel memory to userspace
+ * @vma: user vma to map to
+ * @addr: target page aligned user address to start at
+ * @pfn: page frame number of kernel physical memory address
+ * @size: size of mapping area
+ * @prot: page protection flags for this mapping
+ *
+ * Note: this is only safe if the mm semaphore is held when called.
+ *
+ * Return: %0 on success, negative error code otherwise.
+ */
+int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range_track(vma, addr, pfn, size, prot,
+ /* set_vma = */true);
+}
+
+int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ /* With set_vma = false, the VMA will not be modified. */
+ return remap_pfn_range_track(vma, addr, pfn, size, prot,
+ /* set_vma = */false);
+}
#else
int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t prot)
{
- return remap_pfn_range_notrack(vma, addr, pfn, size, prot);
+ return remap_pfn_range_notrack(vma, addr, pfn, size, prot, /* set_vma = */true);
+}
+
+int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range_notrack(vma, addr, pfn, size, prot,
+ /* set_vma = */false);
}
#endif
EXPORT_SYMBOL(remap_pfn_range);
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* [PATCH v2 07/16] mm: introduce io_remap_pfn_range_[prepare, complete]()
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (5 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 06/16] mm: add remap_pfn_range_prepare(), remap_pfn_range_complete() Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-12 10:23 ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
` (9 subsequent siblings)
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
We introduce the io_remap*() equivalents of remap_pfn_range_prepare() and
remap_pfn_range_complete() to allow for I/O remapping via mmap_prepare.
We have to make some architecture-specific changes for those architectures
which define customised handlers.
It doesn't really make sense to make this internal-only as arches specify
their version of these functions so we declare these in mm.h.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
arch/csky/include/asm/pgtable.h | 5 +++++
arch/mips/alchemy/common/setup.c | 28 +++++++++++++++++++++++++---
arch/mips/include/asm/pgtable.h | 10 ++++++++++
arch/sparc/include/asm/pgtable_32.h | 29 +++++++++++++++++++++++++----
arch/sparc/include/asm/pgtable_64.h | 29 +++++++++++++++++++++++++----
include/linux/mm.h | 18 ++++++++++++++++++
6 files changed, 108 insertions(+), 11 deletions(-)
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index 5a394be09c35..c83505839a06 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -266,4 +266,9 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma,
#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
remap_pfn_range(vma, vaddr, pfn, size, prot)
+/* default io_remap_pfn_range_prepare can be used. */
+
+#define io_remap_pfn_range_complete(vma, addr, pfn, size, prot) \
+ remap_pfn_range_complete(vma, addr, pfn, size, prot)
+
#endif /* __ASM_CSKY_PGTABLE_H */
diff --git a/arch/mips/alchemy/common/setup.c b/arch/mips/alchemy/common/setup.c
index a7a6d31a7a41..a4ab02776994 100644
--- a/arch/mips/alchemy/common/setup.c
+++ b/arch/mips/alchemy/common/setup.c
@@ -94,12 +94,34 @@ phys_addr_t fixup_bigphys_addr(phys_addr_t phys_addr, phys_addr_t size)
return phys_addr;
}
-int io_remap_pfn_range(struct vm_area_struct *vma, unsigned long vaddr,
- unsigned long pfn, unsigned long size, pgprot_t prot)
+static unsigned long calc_pfn(unsigned long pfn, unsigned long size)
{
phys_addr_t phys_addr = fixup_bigphys_addr(pfn << PAGE_SHIFT, size);
- return remap_pfn_range(vma, vaddr, phys_addr >> PAGE_SHIFT, size, prot);
+ return phys_addr >> PAGE_SHIFT;
+}
+
+int io_remap_pfn_range(struct vm_area_struct *vma, unsigned long vaddr,
+ unsigned long pfn, unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range(vma, vaddr, calc_pfn(pfn, size), size, prot);
}
EXPORT_SYMBOL(io_remap_pfn_range);
+
+void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
+ unsigned long size)
+{
+ remap_pfn_range_prepare(desc, calc_pfn(pfn, size));
+}
+EXPORT_SYMBOL(io_remap_pfn_range_prepare);
+
+int io_remap_pfn_range_complete(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ pgprot_t prot)
+{
+ return remap_pfn_range_complete(vma, addr, calc_pfn(pfn, size),
+ size, prot);
+}
+EXPORT_SYMBOL(io_remap_pfn_range_complete);
+
#endif /* CONFIG_MIPS_FIXUP_BIGPHYS_ADDR */
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index ae73ecf4c41a..6a8964f55a31 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -607,6 +607,16 @@ phys_addr_t fixup_bigphys_addr(phys_addr_t addr, phys_addr_t size);
int io_remap_pfn_range(struct vm_area_struct *vma, unsigned long vaddr,
unsigned long pfn, unsigned long size, pgprot_t prot);
#define io_remap_pfn_range io_remap_pfn_range
+
+void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
+ unsigned long size);
+#define io_remap_pfn_range_prepare io_remap_pfn_range_prepare
+
+int io_remap_pfn_range_complete(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ pgprot_t prot);
+#define io_remap_pfn_range_complete io_remap_pfn_range_complete
+
#else
#define fixup_bigphys_addr(addr, size) (addr)
#endif /* CONFIG_MIPS_FIXUP_BIGPHYS_ADDR */
diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index 7c199c003ffe..cfd764afc107 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -398,9 +398,7 @@ __get_iospace (unsigned long addr)
int remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long,
unsigned long, pgprot_t);
-static inline int io_remap_pfn_range(struct vm_area_struct *vma,
- unsigned long from, unsigned long pfn,
- unsigned long size, pgprot_t prot)
+static inline unsigned long calc_io_remap_pfn(unsigned long pfn)
{
unsigned long long offset, space, phys_base;
@@ -408,10 +406,33 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
space = GET_IOSPACE(pfn);
phys_base = offset | (space << 32ULL);
- return remap_pfn_range(vma, from, phys_base >> PAGE_SHIFT, size, prot);
+ return phys_base >> PAGE_SHIFT;
+}
+
+static inline int io_remap_pfn_range(struct vm_area_struct *vma,
+ unsigned long from, unsigned long pfn,
+ unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range(vma, from, calc_io_remap_pfn(pfn), size, prot);
}
#define io_remap_pfn_range io_remap_pfn_range
+static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
+ unsigned long size)
+{
+ remap_pfn_range_prepare(desc, calc_io_remap_pfn(pfn));
+}
+#define io_remap_pfn_range_prepare io_remap_pfn_range_prepare
+
+static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ pgprot_t prot)
+{
+ return remap_pfn_range_complete(vma, addr, calc_io_remap_pfn(pfn),
+ size, prot);
+}
+#define io_remap_pfn_range_complete io_remap_pfn_range_complete
+
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \
({ \
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 669cd02469a1..b8000ce4b59f 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -1084,9 +1084,7 @@ static inline int arch_unmap_one(struct mm_struct *mm,
return 0;
}
-static inline int io_remap_pfn_range(struct vm_area_struct *vma,
- unsigned long from, unsigned long pfn,
- unsigned long size, pgprot_t prot)
+static inline unsigned long calc_io_remap_pfn(unsigned long pfn)
{
unsigned long offset = GET_PFN(pfn) << PAGE_SHIFT;
int space = GET_IOSPACE(pfn);
@@ -1094,10 +1092,33 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
phys_base = offset | (((unsigned long) space) << 32UL);
- return remap_pfn_range(vma, from, phys_base >> PAGE_SHIFT, size, prot);
+ return phys_base >> PAGE_SHIFT;
+}
+
+static inline int io_remap_pfn_range(struct vm_area_struct *vma,
+ unsigned long from, unsigned long pfn,
+ unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range(vma, from, calc_io_remap_pfn(pfn), size, prot);
}
#define io_remap_pfn_range io_remap_pfn_range
+static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
+ unsigned long size)
+{
+ return remap_pfn_range_prepare(desc, calc_io_remap_pfn(pfn));
+}
+#define io_remap_pfn_range_prepare io_remap_pfn_range_prepare
+
+static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ pgprot_t prot)
+{
+ return remap_pfn_range_complete(vma, addr, calc_io_remap_pfn(pfn),
+ size, prot);
+}
+#define io_remap_pfn_range_complete io_remap_pfn_range_complete
+
static inline unsigned long __untagged_addr(unsigned long start)
{
if (adi_capable()) {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e256823799d..cca149bb8ef1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3685,6 +3685,24 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
}
#endif
+#ifndef io_remap_pfn_range_prepare
+static inline void io_remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn,
+ unsigned long size)
+{
+ return remap_pfn_range_prepare(desc, pfn);
+}
+#endif
+
+#ifndef io_remap_pfn_range_complete
+static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ pgprot_t prot)
+{
+ return remap_pfn_range_complete(vma, addr, pfn, size,
+ pgprot_decrypted(prot));
+}
+#endif
+
static inline vm_fault_t vmf_error(int err)
{
if (err == -ENOMEM)
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 07/16] mm: introduce io_remap_pfn_range_[prepare, complete]()
2025-09-10 20:22 ` [PATCH v2 07/16] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
@ 2025-09-12 10:23 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-12 10:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Hi Andrew,
Could you apply the below fix-patch to address the delights and wonders of
arch-specific header stuff? :)
Cheers, Lorenzo
----8<----
From 1a8ddbbb3aab15104e7b7b5b7a5a286dd23d8325 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 12 Sep 2025 10:58:23 +0100
Subject: [PATCH] sparc fix
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
arch/sparc/include/asm/pgtable_32.h | 3 +++
arch/sparc/include/asm/pgtable_64.h | 3 +++
2 files changed, 6 insertions(+)
diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index cfd764afc107..30749c5ffe95 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -397,6 +397,9 @@ __get_iospace (unsigned long addr)
int remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long,
unsigned long, pgprot_t);
+void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
+int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot);
static inline unsigned long calc_io_remap_pfn(unsigned long pfn)
{
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index b8000ce4b59f..b06f55915653 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -1050,6 +1050,9 @@ int page_in_phys_avail(unsigned long paddr);
int remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long,
unsigned long, pgprot_t);
+void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn);
+int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot);
void adi_restore_tags(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pte_t pte);
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (6 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 07/16] mm: introduce io_remap_pfn_range_[prepare, complete]() Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-11 22:07 ` Reinette Chatre
` (4 more replies)
2025-09-10 20:22 ` [PATCH v2 09/16] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
` (8 subsequent siblings)
16 siblings, 5 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Some drivers/filesystems need to perform additional tasks after the VMA is
set up. This is typically in the form of pre-population.
The forms of pre-population most likely to be performed are a PFN remap or
insertion of a mixed map, so we provide this functionality, ensuring that
we perform the appropriate actions at the appropriate time - that is
setting flags at the point of .mmap_prepare, and performing the actual
remap at the point at which the VMA is fully established.
This prevents the driver from doing anything too crazy with a VMA at any
stage, and we retain complete control over how the mm functionality is
applied.
Unfortunately callers still do often require some kind of custom action, so
we add an optional success/error _hook to allow the caller to do something
after the action has succeeded or failed.
This is done at the point when the VMA has already been established, so the
harm that can be done is limited.
The error hook can be used to filter errors if necessary.
We implement actions as abstracted from the vm_area_desc, so we provide the
ability for custom hooks to invoke actions distinct from the vma
descriptor.
If any error arises on these final actions, we simply unmap the VMA
altogether.
Also update the stacked filesystem compatibility layer to utilise the
action behaviour, and update the VMA tests accordingly.
For drivers which perform truly custom logic, we provide a custom action
hook which is invoked at the point of action execution.
This can then, in turn, update the desc object and perform other actions,
such as partially remapping ranges for instance. We export
vma_desc_action_prepare() and vma_desc_action_complete() for drivers to do
this.
This is performed at a stage where the VMA is already established,
immediately prior to mapping completion, so it is considerably less
problematic than a general mmap hook.
Note that at the point of the action being taken, the VMA is visible via
the rmap, only the VMA write lock is held, so if anything needs to access
the VMA, it is able to.
Essentially the action is taken as if it were performed after the mapping,
but is kept atomic with VMA state.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mm.h | 30 ++++++
include/linux/mm_types.h | 61 ++++++++++++
mm/util.c | 150 +++++++++++++++++++++++++++-
mm/vma.c | 70 ++++++++-----
tools/testing/vma/vma_internal.h | 164 ++++++++++++++++++++++++++++++-
5 files changed, 447 insertions(+), 28 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index cca149bb8ef1..2ceead3ffcf0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3597,6 +3597,36 @@ static inline unsigned long vma_desc_pages(struct vm_area_desc *desc)
return vma_desc_size(desc) >> PAGE_SHIFT;
}
+static inline void mmap_action_remap(struct mmap_action *action,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ pgprot_t pgprot)
+{
+ action->type = MMAP_REMAP_PFN;
+
+ action->remap.addr = addr;
+ action->remap.pfn = pfn;
+ action->remap.size = size;
+ action->remap.pgprot = pgprot;
+}
+
+static inline void mmap_action_mixedmap(struct mmap_action *action,
+ unsigned long addr, unsigned long pfn, unsigned long num_pages)
+{
+ action->type = MMAP_INSERT_MIXED;
+
+ action->mixedmap.addr = addr;
+ action->mixedmap.pfn = pfn;
+ action->mixedmap.num_pages = num_pages;
+}
+
+struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
+ unsigned long addr, unsigned long num_pages);
+
+void mmap_action_prepare(struct mmap_action *action,
+ struct vm_area_desc *desc);
+int mmap_action_complete(struct mmap_action *action,
+ struct vm_area_struct *vma);
+
/* Look up the first VMA which exactly match the interval vm_start ... vm_end */
static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
unsigned long vm_start, unsigned long vm_end)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4a441f78340d..ae6c7a0a18a7 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -770,6 +770,64 @@ struct pfnmap_track_ctx {
};
#endif
+/* What action should be taken after an .mmap_prepare call is complete? */
+enum mmap_action_type {
+ MMAP_NOTHING, /* Mapping is complete, no further action. */
+ MMAP_REMAP_PFN, /* Remap PFN range based on desc->remap. */
+ MMAP_INSERT_MIXED, /* Mixed map based on desc->mixedmap. */
+ MMAP_INSERT_MIXED_PAGES, /* Mixed map based on desc->mixedmap_pages. */
+ MMAP_CUSTOM_ACTION, /* User-provided hook. */
+};
+
+struct mmap_action {
+ union {
+ /* Remap range. */
+ struct {
+ unsigned long addr;
+ unsigned long pfn;
+ unsigned long size;
+ pgprot_t pgprot;
+ } remap;
+ /* Insert mixed map. */
+ struct {
+ unsigned long addr;
+ unsigned long pfn;
+ unsigned long num_pages;
+ } mixedmap;
+ /* Insert specific mixed map pages. */
+ struct {
+ unsigned long addr;
+ struct page **pages;
+ unsigned long num_pages;
+ /* kfree pages on completion? */
+ bool kfree_pages :1;
+ } mixedmap_pages;
+ struct {
+ int (*action_hook)(struct vm_area_struct *vma);
+ } custom;
+ };
+ enum mmap_action_type type;
+
+ /*
+ * If specified, this hook is invoked after the selected action has been
+ * successfully completed. Not that the VMA write lock still held.
+ *
+ * The absolute minimum ought to be done here.
+ *
+ * Returns 0 on success, or an error code.
+ */
+ int (*success_hook)(struct vm_area_struct *vma);
+
+ /*
+ * If specified, this hook is invoked when an error occurred when
+ * attempting the selection action.
+ *
+ * The hook can return an error code in order to filter the error, but
+ * it is not valid to clear the error here.
+ */
+ int (*error_hook)(int err);
+};
+
/*
* Describes a VMA that is about to be mmap()'ed. Drivers may choose to
* manipulate mutable fields which will cause those fields to be updated in the
@@ -793,6 +851,9 @@ struct vm_area_desc {
/* Write-only fields. */
const struct vm_operations_struct *vm_ops;
void *private_data;
+
+ /* Take further action? */
+ struct mmap_action action;
};
/*
diff --git a/mm/util.c b/mm/util.c
index 248f877f629b..11752d67b89c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1155,15 +1155,18 @@ int __compat_vma_mmap_prepare(const struct file_operations *f_op,
.vm_file = vma->vm_file,
.vm_flags = vma->vm_flags,
.page_prot = vma->vm_page_prot,
+
+ .action.type = MMAP_NOTHING, /* Default */
};
int err;
err = f_op->mmap_prepare(&desc);
if (err)
return err;
- set_vma_from_desc(vma, &desc);
- return 0;
+ mmap_action_prepare(&desc.action, &desc);
+ set_vma_from_desc(vma, &desc);
+ return mmap_action_complete(&desc.action, vma);
}
EXPORT_SYMBOL(__compat_vma_mmap_prepare);
@@ -1279,6 +1282,149 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
}
}
+struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
+ unsigned long addr, unsigned long num_pages)
+{
+ struct page **pages;
+
+ pages = kmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
+ if (!pages)
+ return NULL;
+
+ action->type = MMAP_INSERT_MIXED_PAGES;
+
+ action->mixedmap_pages.addr = addr;
+ action->mixedmap_pages.num_pages = num_pages;
+ action->mixedmap_pages.kfree_pages = true;
+ action->mixedmap_pages.pages = pages;
+
+ return pages;
+}
+EXPORT_SYMBOL(mmap_action_mixedmap_pages);
+
+/**
+ * mmap_action_prepare - Perform preparatory setup for an VMA descriptor
+ * action which need to be performed.
+ * @desc: The VMA descriptor to prepare for @action.
+ * @action: The action to perform.
+ *
+ * Other than internal mm use, this is intended to be used by mmap_prepare code
+ * which specifies a custom action hook and needs to prepare for another action
+ * it wishes to perform.
+ */
+void mmap_action_prepare(struct mmap_action *action,
+ struct vm_area_desc *desc)
+{
+ switch (action->type) {
+ case MMAP_NOTHING:
+ case MMAP_CUSTOM_ACTION:
+ break;
+ case MMAP_REMAP_PFN:
+ remap_pfn_range_prepare(desc, action->remap.pfn);
+ break;
+ case MMAP_INSERT_MIXED:
+ case MMAP_INSERT_MIXED_PAGES:
+ desc->vm_flags |= VM_MIXEDMAP;
+ break;
+ }
+}
+EXPORT_SYMBOL(mmap_action_prepare);
+
+/**
+ * mmap_action_complete - Execute VMA descriptor action.
+ * @action: The action to perform.
+ * @vma: The VMA to perform the action upon.
+ *
+ * Similar to mmap_action_prepare(), other than internal mm usage this is
+ * intended for mmap_prepare users who implement a custom hook - with this
+ * function being called from the custom hook itself.
+ *
+ * Return: 0 on success, or error, at which point the VMA will be unmapped.
+ */
+int mmap_action_complete(struct mmap_action *action,
+ struct vm_area_struct *vma)
+{
+ int err = 0;
+
+ switch (action->type) {
+ case MMAP_NOTHING:
+ break;
+ case MMAP_REMAP_PFN:
+ VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
+ VM_REMAP_FLAGS);
+
+ err = remap_pfn_range_complete(vma, action->remap.addr,
+ action->remap.pfn, action->remap.size,
+ action->remap.pgprot);
+
+ break;
+ case MMAP_INSERT_MIXED:
+ {
+ unsigned long pgnum = 0;
+ unsigned long pfn = action->mixedmap.pfn;
+ unsigned long addr = action->mixedmap.addr;
+ unsigned long vaddr = vma->vm_start;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
+
+ for (; pgnum < action->mixedmap.num_pages;
+ pgnum++, pfn++, addr += PAGE_SIZE, vaddr += PAGE_SIZE) {
+ vm_fault_t vmf;
+
+ vmf = vmf_insert_mixed(vma, vaddr, addr);
+ if (vmf & VM_FAULT_ERROR) {
+ err = vm_fault_to_errno(vmf, 0);
+ break;
+ }
+ }
+
+ break;
+ }
+ case MMAP_INSERT_MIXED_PAGES:
+ {
+ struct page **pages = action->mixedmap_pages.pages;
+ unsigned long nr_pages = action->mixedmap_pages.num_pages;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
+
+ err = vm_insert_pages(vma, action->mixedmap_pages.addr,
+ pages, &nr_pages);
+ if (action->mixedmap_pages.kfree_pages)
+ kfree(pages);
+ break;
+ }
+ case MMAP_CUSTOM_ACTION:
+ err = action->custom.action_hook(vma);
+ break;
+ }
+
+ /*
+ * If an error occurs, unmap the VMA altogether and return an error. We
+ * only clear the newly allocated VMA, since this function is only
+ * invoked if we do NOT merge, so we only clean up the VMA we created.
+ */
+ if (err) {
+ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+ do_munmap(current->mm, vma->vm_start, len, NULL);
+
+ if (action->error_hook) {
+ /* We may want to filter the error. */
+ err = action->error_hook(err);
+
+ /* The caller should not clear the error. */
+ VM_WARN_ON_ONCE(!err);
+ }
+ return err;
+ }
+
+ if (action->success_hook)
+ err = action->success_hook(vma);
+
+ return 0;
+}
+EXPORT_SYMBOL(mmap_action_complete);
+
#ifdef CONFIG_MMU
/**
* folio_pte_batch - detect a PTE batch for a large folio
diff --git a/mm/vma.c b/mm/vma.c
index 36a9f4d453be..a1ec405bda25 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2328,17 +2328,33 @@ static void update_ksm_flags(struct mmap_state *map)
map->vm_flags = ksm_vma_flags(map->mm, map->file, map->vm_flags);
}
+static void set_desc_from_map(struct vm_area_desc *desc,
+ const struct mmap_state *map)
+{
+ desc->start = map->addr;
+ desc->end = map->end;
+
+ desc->pgoff = map->pgoff;
+ desc->vm_file = map->file;
+ desc->vm_flags = map->vm_flags;
+ desc->page_prot = map->page_prot;
+}
+
/*
* __mmap_setup() - Prepare to gather any overlapping VMAs that need to be
* unmapped once the map operation is completed, check limits, account mapping
* and clean up any pre-existing VMAs.
*
+ * As a result it sets up the @map and @desc objects.
+ *
* @map: Mapping state.
+ * @desc: VMA descriptor
* @uf: Userfaultfd context list.
*
* Returns: 0 on success, error code otherwise.
*/
-static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
+static int __mmap_setup(struct mmap_state *map, struct vm_area_desc *desc,
+ struct list_head *uf)
{
int error;
struct vma_iterator *vmi = map->vmi;
@@ -2395,6 +2411,7 @@ static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
*/
vms_clean_up_area(vms, &map->mas_detach);
+ set_desc_from_map(desc, map);
return 0;
}
@@ -2567,34 +2584,26 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
*
* Returns 0 on success, or an error code otherwise.
*/
-static int call_mmap_prepare(struct mmap_state *map)
+static int call_mmap_prepare(struct mmap_state *map,
+ struct vm_area_desc *desc)
{
int err;
- struct vm_area_desc desc = {
- .mm = map->mm,
- .file = map->file,
- .start = map->addr,
- .end = map->end,
-
- .pgoff = map->pgoff,
- .vm_file = map->file,
- .vm_flags = map->vm_flags,
- .page_prot = map->page_prot,
- };
/* Invoke the hook. */
- err = vfs_mmap_prepare(map->file, &desc);
+ err = vfs_mmap_prepare(map->file, desc);
if (err)
return err;
+ mmap_action_prepare(&desc->action, desc);
+
/* Update fields permitted to be changed. */
- map->pgoff = desc.pgoff;
- map->file = desc.vm_file;
- map->vm_flags = desc.vm_flags;
- map->page_prot = desc.page_prot;
+ map->pgoff = desc->pgoff;
+ map->file = desc->vm_file;
+ map->vm_flags = desc->vm_flags;
+ map->page_prot = desc->page_prot;
/* User-defined fields. */
- map->vm_ops = desc.vm_ops;
- map->vm_private_data = desc.private_data;
+ map->vm_ops = desc->vm_ops;
+ map->vm_private_data = desc->private_data;
return 0;
}
@@ -2642,16 +2651,24 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
- int error;
bool have_mmap_prepare = file && file->f_op->mmap_prepare;
VMA_ITERATOR(vmi, mm, addr);
MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
+ struct vm_area_desc desc = {
+ .mm = mm,
+ .file = file,
+ .action = {
+ .type = MMAP_NOTHING, /* Default to no further action. */
+ },
+ };
+ bool allocated_new = false;
+ int error;
map.check_ksm_early = can_set_ksm_flags_early(&map);
- error = __mmap_setup(&map, uf);
+ error = __mmap_setup(&map, &desc, uf);
if (!error && have_mmap_prepare)
- error = call_mmap_prepare(&map);
+ error = call_mmap_prepare(&map, &desc);
if (error)
goto abort_munmap;
@@ -2670,6 +2687,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
error = __mmap_new_vma(&map, &vma);
if (error)
goto unacct_error;
+ allocated_new = true;
}
if (have_mmap_prepare)
@@ -2677,6 +2695,12 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
__mmap_complete(&map, vma);
+ if (have_mmap_prepare && allocated_new) {
+ error = mmap_action_complete(&desc.action, vma);
+ if (error)
+ return error;
+ }
+
return addr;
/* Accounting was done by __mmap_setup(). */
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index 07167446dcf4..c21642974798 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -170,6 +170,28 @@ typedef __bitwise unsigned int vm_fault_t;
#define swap(a, b) \
do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
+enum vm_fault_reason {
+ VM_FAULT_OOM = (__force vm_fault_t)0x000001,
+ VM_FAULT_SIGBUS = (__force vm_fault_t)0x000002,
+ VM_FAULT_MAJOR = (__force vm_fault_t)0x000004,
+ VM_FAULT_HWPOISON = (__force vm_fault_t)0x000010,
+ VM_FAULT_HWPOISON_LARGE = (__force vm_fault_t)0x000020,
+ VM_FAULT_SIGSEGV = (__force vm_fault_t)0x000040,
+ VM_FAULT_NOPAGE = (__force vm_fault_t)0x000100,
+ VM_FAULT_LOCKED = (__force vm_fault_t)0x000200,
+ VM_FAULT_RETRY = (__force vm_fault_t)0x000400,
+ VM_FAULT_FALLBACK = (__force vm_fault_t)0x000800,
+ VM_FAULT_DONE_COW = (__force vm_fault_t)0x001000,
+ VM_FAULT_NEEDDSYNC = (__force vm_fault_t)0x002000,
+ VM_FAULT_COMPLETED = (__force vm_fault_t)0x004000,
+ VM_FAULT_HINDEX_MASK = (__force vm_fault_t)0x0f0000,
+};
+#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | \
+ VM_FAULT_SIGSEGV | VM_FAULT_HWPOISON | \
+ VM_FAULT_HWPOISON_LARGE | VM_FAULT_FALLBACK)
+
+#define FOLL_HWPOISON (1 << 6)
+
struct kref {
refcount_t refcount;
};
@@ -274,6 +296,92 @@ struct mm_struct {
struct vm_area_struct;
+/* What action should be taken after an .mmap_prepare call is complete? */
+enum mmap_action_type {
+ MMAP_NOTHING, /* Mapping is complete, no further action. */
+ MMAP_REMAP_PFN, /* Remap PFN range based on desc->remap. */
+ MMAP_INSERT_MIXED, /* Mixed map based on desc->mixedmap. */
+ MMAP_INSERT_MIXED_PAGES, /* Mixed map based on desc->mixedmap_pages. */
+ MMAP_CUSTOM_ACTION, /* User-provided hook. */
+};
+
+struct mmap_action {
+ union {
+ /* Remap range. */
+ struct {
+ unsigned long addr;
+ unsigned long pfn;
+ unsigned long size;
+ pgprot_t pgprot;
+ } remap;
+ /* Insert mixed map. */
+ struct {
+ unsigned long addr;
+ unsigned long pfn;
+ unsigned long num_pages;
+ } mixedmap;
+ /* Insert specific mixed map pages. */
+ struct {
+ unsigned long addr;
+ struct page **pages;
+ unsigned long num_pages;
+ /* kfree pages on completion? */
+ bool kfree_pages :1;
+ } mixedmap_pages;
+ struct {
+ int (*action_hook)(struct vm_area_struct *vma);
+ } custom;
+ };
+ enum mmap_action_type type;
+
+ /*
+ * If specified, this hook is invoked after the selected action has been
+ * successfully completed. Not that the VMA write lock still held.
+ *
+ * The absolute minimum ought to be done here.
+ *
+ * Returns 0 on success, or an error code.
+ */
+ int (*success_hook)(struct vm_area_struct *vma);
+
+ /*
+ * If specified, this hook is invoked when an error occurred when
+ * attempting the selection action.
+ *
+ * The hook can return an error code in order to filter the error, but
+ * it is not valid to clear the error here.
+ */
+ int (*error_hook)(int err);
+};
+
+/*
+ * Describes a VMA that is about to be mmap()'ed. Drivers may choose to
+ * manipulate mutable fields which will cause those fields to be updated in the
+ * resultant VMA.
+ *
+ * Helper functions are not required for manipulating any field.
+ */
+struct vm_area_desc {
+ /* Immutable state. */
+ const struct mm_struct *const mm;
+ struct file *const file; /* May vary from vm_file in stacked callers. */
+ unsigned long start;
+ unsigned long end;
+
+ /* Mutable fields. Populated with initial state. */
+ pgoff_t pgoff;
+ struct file *vm_file;
+ vm_flags_t vm_flags;
+ pgprot_t page_prot;
+
+ /* Write-only fields. */
+ const struct vm_operations_struct *vm_ops;
+ void *private_data;
+
+ /* Take further action? */
+ struct mmap_action action;
+};
+
/*
* Describes a VMA that is about to be mmap()'ed. Drivers may choose to
* manipulate mutable fields which will cause those fields to be updated in the
@@ -297,6 +405,9 @@ struct vm_area_desc {
/* Write-only fields. */
const struct vm_operations_struct *vm_ops;
void *private_data;
+
+ /* Take further action? */
+ struct mmap_action action;
};
struct file_operations {
@@ -1466,12 +1577,23 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
static inline void set_vma_from_desc(struct vm_area_struct *vma,
struct vm_area_desc *desc);
+static inline void mmap_action_prepare(struct mmap_action *action,
+ struct vm_area_desc *desc)
+{
+}
+
+static inline int mmap_action_complete(struct mmap_action *action,
+ struct vm_area_struct *vma)
+{
+ return 0;
+}
+
static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
struct file *file, struct vm_area_struct *vma)
{
struct vm_area_desc desc = {
.mm = vma->vm_mm,
- .file = vma->vm_file,
+ .file = file,
.start = vma->vm_start,
.end = vma->vm_end,
@@ -1479,15 +1601,18 @@ static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
.vm_file = vma->vm_file,
.vm_flags = vma->vm_flags,
.page_prot = vma->vm_page_prot,
+
+ .action.type = MMAP_NOTHING, /* Default */
};
int err;
err = f_op->mmap_prepare(&desc);
if (err)
return err;
- set_vma_from_desc(vma, &desc);
- return 0;
+ mmap_action_prepare(&desc.action, &desc);
+ set_vma_from_desc(vma, &desc);
+ return mmap_action_complete(&desc.action, vma);
}
static inline int compat_vma_mmap_prepare(struct file *file,
@@ -1548,4 +1673,37 @@ static inline vm_flags_t ksm_vma_flags(const struct mm_struct *, const struct fi
return vm_flags;
}
+static inline void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
+{
+}
+
+static inline int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn, unsigned long size, pgprot_t pgprot)
+{
+ return 0;
+}
+
+static inline vm_fault_t vmf_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
+ unsigned long pfn)
+{
+ return 0;
+}
+
+static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
+{
+ if (vm_fault & VM_FAULT_OOM)
+ return -ENOMEM;
+ if (vm_fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE))
+ return (foll_flags & FOLL_HWPOISON) ? -EHWPOISON : -EFAULT;
+ if (vm_fault & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV))
+ return -EFAULT;
+ return 0;
+}
+
+static inline int do_munmap(struct mm_struct *, unsigned long, size_t,
+ struct list_head *uf)
+{
+ return 0;
+}
+
#endif /* __MM_VMA_INTERNAL_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-10 20:22 ` [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
@ 2025-09-11 22:07 ` Reinette Chatre
2025-09-12 10:18 ` Lorenzo Stoakes
2025-09-12 10:25 ` Lorenzo Stoakes
` (3 subsequent siblings)
4 siblings, 1 reply; 55+ messages in thread
From: Reinette Chatre @ 2025-09-11 22:07 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
Hi Lorenzo,
On 9/10/25 1:22 PM, Lorenzo Stoakes wrote:
...
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 4a441f78340d..ae6c7a0a18a7 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -770,6 +770,64 @@ struct pfnmap_track_ctx {
> };
> #endif
>
> +/* What action should be taken after an .mmap_prepare call is complete? */
> +enum mmap_action_type {
> + MMAP_NOTHING, /* Mapping is complete, no further action. */
> + MMAP_REMAP_PFN, /* Remap PFN range based on desc->remap. */
> + MMAP_INSERT_MIXED, /* Mixed map based on desc->mixedmap. */
> + MMAP_INSERT_MIXED_PAGES, /* Mixed map based on desc->mixedmap_pages. */
> + MMAP_CUSTOM_ACTION, /* User-provided hook. */
> +};
> +
> +struct mmap_action {
> + union {
> + /* Remap range. */
> + struct {
> + unsigned long addr;
> + unsigned long pfn;
> + unsigned long size;
> + pgprot_t pgprot;
> + } remap;
> + /* Insert mixed map. */
> + struct {
> + unsigned long addr;
> + unsigned long pfn;
> + unsigned long num_pages;
> + } mixedmap;
> + /* Insert specific mixed map pages. */
> + struct {
> + unsigned long addr;
> + struct page **pages;
> + unsigned long num_pages;
> + /* kfree pages on completion? */
> + bool kfree_pages :1;
> + } mixedmap_pages;
> + struct {
> + int (*action_hook)(struct vm_area_struct *vma);
> + } custom;
> + };
> + enum mmap_action_type type;
> +
> + /*
> + * If specified, this hook is invoked after the selected action has been
> + * successfully completed. Not that the VMA write lock still held.
A typo that may trip tired eyes: Not -> Note ? (perhaps also "is still held"?)
(also in the duplicate changes to tools/testing/vma/vma_internal.h)
> + *
> + * The absolute minimum ought to be done here.
> + *
> + * Returns 0 on success, or an error code.
> + */
> + int (*success_hook)(struct vm_area_struct *vma);
> +
> + /*
> + * If specified, this hook is invoked when an error occurred when
> + * attempting the selection action.
> + *
> + * The hook can return an error code in order to filter the error, but
> + * it is not valid to clear the error here.
> + */
> + int (*error_hook)(int err);
> +};
Reinette
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-11 22:07 ` Reinette Chatre
@ 2025-09-12 10:18 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-12 10:18 UTC (permalink / raw)
To: Reinette Chatre
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Dave Martin, James Morse, Alexander Viro,
Christian Brauner, Jan Kara, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Hugh Dickins,
Baolin Wang, Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov,
Jann Horn, Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel,
linux-csky, linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl,
linux-mm, ntfs3, kexec, kasan-dev, Jason Gunthorpe
On Thu, Sep 11, 2025 at 03:07:21PM -0700, Reinette Chatre wrote:
> Hi Lorenzo,
>
> On 9/10/25 1:22 PM, Lorenzo Stoakes wrote:
>
> ...
>
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 4a441f78340d..ae6c7a0a18a7 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -770,6 +770,64 @@ struct pfnmap_track_ctx {
> > };
> > #endif
> >
> > +/* What action should be taken after an .mmap_prepare call is complete? */
> > +enum mmap_action_type {
> > + MMAP_NOTHING, /* Mapping is complete, no further action. */
> > + MMAP_REMAP_PFN, /* Remap PFN range based on desc->remap. */
> > + MMAP_INSERT_MIXED, /* Mixed map based on desc->mixedmap. */
> > + MMAP_INSERT_MIXED_PAGES, /* Mixed map based on desc->mixedmap_pages. */
> > + MMAP_CUSTOM_ACTION, /* User-provided hook. */
> > +};
> > +
> > +struct mmap_action {
> > + union {
> > + /* Remap range. */
> > + struct {
> > + unsigned long addr;
> > + unsigned long pfn;
> > + unsigned long size;
> > + pgprot_t pgprot;
> > + } remap;
> > + /* Insert mixed map. */
> > + struct {
> > + unsigned long addr;
> > + unsigned long pfn;
> > + unsigned long num_pages;
> > + } mixedmap;
> > + /* Insert specific mixed map pages. */
> > + struct {
> > + unsigned long addr;
> > + struct page **pages;
> > + unsigned long num_pages;
> > + /* kfree pages on completion? */
> > + bool kfree_pages :1;
> > + } mixedmap_pages;
> > + struct {
> > + int (*action_hook)(struct vm_area_struct *vma);
> > + } custom;
> > + };
> > + enum mmap_action_type type;
> > +
> > + /*
> > + * If specified, this hook is invoked after the selected action has been
> > + * successfully completed. Not that the VMA write lock still held.
>
> A typo that may trip tired eyes: Not -> Note ? (perhaps also "is still held"?)
> (also in the duplicate changes to tools/testing/vma/vma_internal.h)
Yeah good catch! Will fix if respin :)
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-10 20:22 ` [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
2025-09-11 22:07 ` Reinette Chatre
@ 2025-09-12 10:25 ` Lorenzo Stoakes
2025-09-13 22:54 ` Chris Mason
` (2 subsequent siblings)
4 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-12 10:25 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Hi Andrew,
Could you apply the below fix-patch to make nommu happy? It also has a couple
trivial whitespace fixes in it.
Thanks, Lorenzo
----8<----
From 94d0d29ab23b48bd301eb7e4e9abe88546565d7a Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 12 Sep 2025 10:56:39 +0100
Subject: [PATCH] nommu fix
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/util.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 66 insertions(+), 2 deletions(-)
diff --git a/mm/util.c b/mm/util.c
index 11752d67b89c..f0730efd34eb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1302,6 +1302,7 @@ struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
}
EXPORT_SYMBOL(mmap_action_mixedmap_pages);
+#ifdef CONFIG_MMU
/**
* mmap_action_prepare - Perform preparatory setup for an VMA descriptor
* action which need to be performed.
@@ -1313,7 +1314,7 @@ EXPORT_SYMBOL(mmap_action_mixedmap_pages);
* it wishes to perform.
*/
void mmap_action_prepare(struct mmap_action *action,
- struct vm_area_desc *desc)
+ struct vm_area_desc *desc)
{
switch (action->type) {
case MMAP_NOTHING:
@@ -1342,7 +1343,7 @@ EXPORT_SYMBOL(mmap_action_prepare);
* Return: 0 on success, or error, at which point the VMA will be unmapped.
*/
int mmap_action_complete(struct mmap_action *action,
- struct vm_area_struct *vma)
+ struct vm_area_struct *vma)
{
int err = 0;
@@ -1424,6 +1425,69 @@ int mmap_action_complete(struct mmap_action *action,
return 0;
}
EXPORT_SYMBOL(mmap_action_complete);
+#else
+void mmap_action_prepare(struct mmap_action *action,
+ struct vm_area_desc *desc)
+{
+ switch (action->type) {
+ case MMAP_NOTHING:
+ case MMAP_CUSTOM_ACTION:
+ break;
+ case MMAP_REMAP_PFN:
+ case MMAP_INSERT_MIXED:
+ case MMAP_INSERT_MIXED_PAGES:
+ WARN_ON_ONCE(1); /* nommu cannot handle these. */
+ break;
+ }
+}
+EXPORT_SYMBOL(mmap_action_prepare);
+
+int mmap_action_complete(struct mmap_action *action,
+ struct vm_area_struct *vma)
+{
+ int err = 0;
+
+ switch (action->type) {
+ case MMAP_NOTHING:
+ break;
+ case MMAP_REMAP_PFN:
+ case MMAP_INSERT_MIXED:
+ case MMAP_INSERT_MIXED_PAGES:
+ WARN_ON_ONCE(1); /* nommu cannot handle these. */
+
+ break;
+ case MMAP_CUSTOM_ACTION:
+ err = action->custom.action_hook(vma);
+ break;
+ }
+
+ /*
+ * If an error occurs, unmap the VMA altogether and return an error. We
+ * only clear the newly allocated VMA, since this function is only
+ * invoked if we do NOT merge, so we only clean up the VMA we created.
+ */
+ if (err) {
+ const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+ do_munmap(current->mm, vma->vm_start, len, NULL);
+
+ if (action->error_hook) {
+ /* We may want to filter the error. */
+ err = action->error_hook(err);
+
+ /* The caller should not clear the error. */
+ VM_WARN_ON_ONCE(!err);
+ }
+ return err;
+ }
+
+ if (action->success_hook)
+ err = action->success_hook(vma);
+
+ return 0;
+}
+EXPORT_SYMBOL(mmap_action_complete);
+#endif
#ifdef CONFIG_MMU
/**
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-10 20:22 ` [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
2025-09-11 22:07 ` Reinette Chatre
2025-09-12 10:25 ` Lorenzo Stoakes
@ 2025-09-13 22:54 ` Chris Mason
2025-09-15 9:56 ` Lorenzo Stoakes
2025-09-15 10:09 ` Lorenzo Stoakes
2025-09-15 12:11 ` Jason Gunthorpe
4 siblings, 1 reply; 55+ messages in thread
From: Chris Mason @ 2025-09-13 22:54 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Hi Lorzeno,
On 9/10/25 4:22 PM, Lorenzo Stoakes wrote:
> Some drivers/filesystems need to perform additional tasks after the VMA is
> set up. This is typically in the form of pre-population.
>
> The forms of pre-population most likely to be performed are a PFN remap or
> insertion of a mixed map, so we provide this functionality, ensuring that
> we perform the appropriate actions at the appropriate time - that is
> setting flags at the point of .mmap_prepare, and performing the actual
> remap at the point at which the VMA is fully established.
>
> This prevents the driver from doing anything too crazy with a VMA at any
> stage, and we retain complete control over how the mm functionality is
> applied.
>
> Unfortunately callers still do often require some kind of custom action, so
> we add an optional success/error _hook to allow the caller to do something
> after the action has succeeded or failed.
>
> This is done at the point when the VMA has already been established, so the
> harm that can be done is limited.
>
> The error hook can be used to filter errors if necessary.
>
> We implement actions as abstracted from the vm_area_desc, so we provide the
> ability for custom hooks to invoke actions distinct from the vma
> descriptor.
>
> If any error arises on these final actions, we simply unmap the VMA
> altogether.
>
> Also update the stacked filesystem compatibility layer to utilise the
> action behaviour, and update the VMA tests accordingly.
>
> For drivers which perform truly custom logic, we provide a custom action
> hook which is invoked at the point of action execution.
>
> This can then, in turn, update the desc object and perform other actions,
> such as partially remapping ranges for instance. We export
> vma_desc_action_prepare() and vma_desc_action_complete() for drivers to do
> this.
>
> This is performed at a stage where the VMA is already established,
> immediately prior to mapping completion, so it is considerably less
> problematic than a general mmap hook.
>
> Note that at the point of the action being taken, the VMA is visible via
> the rmap, only the VMA write lock is held, so if anything needs to access
> the VMA, it is able to.
>
> Essentially the action is taken as if it were performed after the mapping,
> but is kept atomic with VMA state.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/mm.h | 30 ++++++
> include/linux/mm_types.h | 61 ++++++++++++
> mm/util.c | 150 +++++++++++++++++++++++++++-
> mm/vma.c | 70 ++++++++-----
> tools/testing/vma/vma_internal.h | 164 ++++++++++++++++++++++++++++++-
> 5 files changed, 447 insertions(+), 28 deletions(-)
>
[ ... ]
> +/**
> + * mmap_action_complete - Execute VMA descriptor action.
> + * @action: The action to perform.
> + * @vma: The VMA to perform the action upon.
> + *
> + * Similar to mmap_action_prepare(), other than internal mm usage this is
> + * intended for mmap_prepare users who implement a custom hook - with this
> + * function being called from the custom hook itself.
> + *
> + * Return: 0 on success, or error, at which point the VMA will be unmapped.
> + */
> +int mmap_action_complete(struct mmap_action *action,
> + struct vm_area_struct *vma)
> +{
> + int err = 0;
> +
> + switch (action->type) {
> + case MMAP_NOTHING:
> + break;
> + case MMAP_REMAP_PFN:
> + VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
> + VM_REMAP_FLAGS);
> +
> + err = remap_pfn_range_complete(vma, action->remap.addr,
> + action->remap.pfn, action->remap.size,
> + action->remap.pgprot);
> +
> + break;
> + case MMAP_INSERT_MIXED:
> + {
> + unsigned long pgnum = 0;
> + unsigned long pfn = action->mixedmap.pfn;
> + unsigned long addr = action->mixedmap.addr;
> + unsigned long vaddr = vma->vm_start;
> +
> + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
> +
> + for (; pgnum < action->mixedmap.num_pages;
> + pgnum++, pfn++, addr += PAGE_SIZE, vaddr += PAGE_SIZE) {
> + vm_fault_t vmf;
> +
> + vmf = vmf_insert_mixed(vma, vaddr, addr);
^^^^^
Should this be pfn instead of addr?
-chris
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-13 22:54 ` Chris Mason
@ 2025-09-15 9:56 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 9:56 UTC (permalink / raw)
To: Chris Mason
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Sat, Sep 13, 2025 at 06:54:06PM -0400, Chris Mason wrote:
> Hi Lorzeno,
>
> On 9/10/25 4:22 PM, Lorenzo Stoakes wrote:
> > Some drivers/filesystems need to perform additional tasks after the VMA is
> > set up. This is typically in the form of pre-population.
> >
> > The forms of pre-population most likely to be performed are a PFN remap or
> > insertion of a mixed map, so we provide this functionality, ensuring that
> > we perform the appropriate actions at the appropriate time - that is
> > setting flags at the point of .mmap_prepare, and performing the actual
> > remap at the point at which the VMA is fully established.
> >
> > This prevents the driver from doing anything too crazy with a VMA at any
> > stage, and we retain complete control over how the mm functionality is
> > applied.
> >
> > Unfortunately callers still do often require some kind of custom action, so
> > we add an optional success/error _hook to allow the caller to do something
> > after the action has succeeded or failed.
> >
> > This is done at the point when the VMA has already been established, so the
> > harm that can be done is limited.
> >
> > The error hook can be used to filter errors if necessary.
> >
> > We implement actions as abstracted from the vm_area_desc, so we provide the
> > ability for custom hooks to invoke actions distinct from the vma
> > descriptor.
> >
> > If any error arises on these final actions, we simply unmap the VMA
> > altogether.
> >
> > Also update the stacked filesystem compatibility layer to utilise the
> > action behaviour, and update the VMA tests accordingly.
> >
> > For drivers which perform truly custom logic, we provide a custom action
> > hook which is invoked at the point of action execution.
> >
> > This can then, in turn, update the desc object and perform other actions,
> > such as partially remapping ranges for instance. We export
> > vma_desc_action_prepare() and vma_desc_action_complete() for drivers to do
> > this.
> >
> > This is performed at a stage where the VMA is already established,
> > immediately prior to mapping completion, so it is considerably less
> > problematic than a general mmap hook.
> >
> > Note that at the point of the action being taken, the VMA is visible via
> > the rmap, only the VMA write lock is held, so if anything needs to access
> > the VMA, it is able to.
> >
> > Essentially the action is taken as if it were performed after the mapping,
> > but is kept atomic with VMA state.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > include/linux/mm.h | 30 ++++++
> > include/linux/mm_types.h | 61 ++++++++++++
> > mm/util.c | 150 +++++++++++++++++++++++++++-
> > mm/vma.c | 70 ++++++++-----
> > tools/testing/vma/vma_internal.h | 164 ++++++++++++++++++++++++++++++-
> > 5 files changed, 447 insertions(+), 28 deletions(-)
> >
>
> [ ... ]
>
> > +/**
> > + * mmap_action_complete - Execute VMA descriptor action.
> > + * @action: The action to perform.
> > + * @vma: The VMA to perform the action upon.
> > + *
> > + * Similar to mmap_action_prepare(), other than internal mm usage this is
> > + * intended for mmap_prepare users who implement a custom hook - with this
> > + * function being called from the custom hook itself.
> > + *
> > + * Return: 0 on success, or error, at which point the VMA will be unmapped.
> > + */
> > +int mmap_action_complete(struct mmap_action *action,
> > + struct vm_area_struct *vma)
> > +{
> > + int err = 0;
> > +
> > + switch (action->type) {
> > + case MMAP_NOTHING:
> > + break;
> > + case MMAP_REMAP_PFN:
> > + VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
> > + VM_REMAP_FLAGS);
> > +
> > + err = remap_pfn_range_complete(vma, action->remap.addr,
> > + action->remap.pfn, action->remap.size,
> > + action->remap.pgprot);
> > +
> > + break;
> > + case MMAP_INSERT_MIXED:
> > + {
> > + unsigned long pgnum = 0;
> > + unsigned long pfn = action->mixedmap.pfn;
> > + unsigned long addr = action->mixedmap.addr;
> > + unsigned long vaddr = vma->vm_start;
> > +
> > + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
> > +
> > + for (; pgnum < action->mixedmap.num_pages;
> > + pgnum++, pfn++, addr += PAGE_SIZE, vaddr += PAGE_SIZE) {
> > + vm_fault_t vmf;
> > +
> > + vmf = vmf_insert_mixed(vma, vaddr, addr);
> ^^^^^
> Should this be pfn instead of addr?
Yeah, sigh, this is a direct product of cramfs seemingly having a bug where it
was passing PA's and not PFNs.
I thought I had fixed this but clearly I missed this here.
Let me send a fix-patch!
>
> -chris
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-10 20:22 ` [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
` (2 preceding siblings ...)
2025-09-13 22:54 ` Chris Mason
@ 2025-09-15 10:09 ` Lorenzo Stoakes
2025-09-15 12:11 ` Jason Gunthorpe
4 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 10:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Hi Andrew,
Could you apply the below fixpatch?
Thanks, Lorenzo
----8<----
From 35b96b949b44397c744b18f10b40a9989d4a92d2 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Mon, 15 Sep 2025 11:01:06 +0100
Subject: [PATCH] mm: fix incorrect mixedmap implementation
This was typo'd due to staring too long at the cramfs implementation.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/util.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/util.c b/mm/util.c
index 9bfef9509d35..23a2ec675344 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1364,15 +1364,14 @@ int mmap_action_complete(struct mmap_action *action,
unsigned long pgnum = 0;
unsigned long pfn = action->mixedmap.pfn;
unsigned long addr = action->mixedmap.addr;
- unsigned long vaddr = vma->vm_start;
VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
for (; pgnum < action->mixedmap.num_pages;
- pgnum++, pfn++, addr += PAGE_SIZE, vaddr += PAGE_SIZE) {
+ pgnum++, pfn++, addr += PAGE_SIZE) {
vm_fault_t vmf;
- vmf = vmf_insert_mixed(vma, vaddr, addr);
+ vmf = vmf_insert_mixed(vma, addr, pfn);
if (vmf & VM_FAULT_ERROR) {
err = vm_fault_to_errno(vmf, 0);
break;
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-10 20:22 ` [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
` (3 preceding siblings ...)
2025-09-15 10:09 ` Lorenzo Stoakes
@ 2025-09-15 12:11 ` Jason Gunthorpe
2025-09-15 12:23 ` Lorenzo Stoakes
4 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-09-15 12:11 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote:
> +static inline void mmap_action_remap(struct mmap_action *action,
> + unsigned long addr, unsigned long pfn, unsigned long size,
> + pgprot_t pgprot)
> +{
> + action->type = MMAP_REMAP_PFN;
> +
> + action->remap.addr = addr;
> + action->remap.pfn = pfn;
> + action->remap.size = size;
> + action->remap.pgprot = pgprot;
> +}
These helpers drivers are supposed to call really should have kdocs.
Especially since 'addr' is sort of ambigous.
And I'm wondering why they don't take in the vm_area_desc? Eg shouldn't
we be strongly discouraging using anything other than
vma->vm_page_prot as the last argument?
I'd probably also have a small helper wrapper for the very common case
of whole vma:
/* Fill the entire VMA with pfns starting at pfn. Caller must have
* already checked desc has an appropriate size */
mmap_action_remap_full(struct vm_area_desc *desc, unsigned long pfn)
It is not normal for a driver to partially populate a VMA, lets call
those out as something weird.
> +struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
> + unsigned long addr, unsigned long num_pages)
> +{
> + struct page **pages;
> +
> + pages = kmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
> + if (!pages)
> + return NULL;
This allocation seems like a shame, I doubt many places actually need
it .. A callback to get each pfn would be better?
Jason
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-15 12:11 ` Jason Gunthorpe
@ 2025-09-15 12:23 ` Lorenzo Stoakes
2025-09-15 12:42 ` Jason Gunthorpe
0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 12:23 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 09:11:12AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote:
> > +static inline void mmap_action_remap(struct mmap_action *action,
> > + unsigned long addr, unsigned long pfn, unsigned long size,
> > + pgprot_t pgprot)
> > +{
> > + action->type = MMAP_REMAP_PFN;
> > +
> > + action->remap.addr = addr;
> > + action->remap.pfn = pfn;
> > + action->remap.size = size;
> > + action->remap.pgprot = pgprot;
> > +}
>
> These helpers drivers are supposed to call really should have kdocs.
>
> Especially since 'addr' is sort of ambigous.
OK.
>
> And I'm wondering why they don't take in the vm_area_desc? Eg shouldn't
> we be strongly discouraging using anything other than
> vma->vm_page_prot as the last argument?
I need to abstract desc from action so custom handlers can perform
sub-actions. It's unfortunate but there we go.
There'd be horrible confusion passing around a desc that has an action in
it that you then ignore, otherwise. Better to abstract the concept of
action altogether.
>
> I'd probably also have a small helper wrapper for the very common case
> of whole vma:
>
> /* Fill the entire VMA with pfns starting at pfn. Caller must have
> * already checked desc has an appropriate size */
> mmap_action_remap_full(struct vm_area_desc *desc, unsigned long pfn)
See above re: desc vs. action.
>
> It is not normal for a driver to partially populate a VMA, lets call
> those out as something weird.
>
> > +struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
> > + unsigned long addr, unsigned long num_pages)
> > +{
> > + struct page **pages;
> > +
> > + pages = kmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
> > + if (!pages)
> > + return NULL;
>
> This allocation seems like a shame, I doubt many places actually need
> it .. A callback to get each pfn would be better?
It'd be hard to know how to get the context right that'd need to be supplied to
the callback.
In kcov's case it'd be kcov->area + an offset.
So we'd need an offset parameter, the struct file *, whatever else to be
passed.
And then we'll find a driver where that doesn't work and we're screwed.
I don't think optimising for mmap setup is really important.
We can always go back and refactor things later once this pattern is
established.
And again with ~230 odd drivers to update, I'd rather keep things as simple
as possible for now.
>
> Jason
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-15 12:23 ` Lorenzo Stoakes
@ 2025-09-15 12:42 ` Jason Gunthorpe
2025-09-15 12:54 ` Lorenzo Stoakes
0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-09-15 12:42 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 01:23:30PM +0100, Lorenzo Stoakes wrote:
> On Mon, Sep 15, 2025 at 09:11:12AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote:
> > > +static inline void mmap_action_remap(struct mmap_action *action,
> > > + unsigned long addr, unsigned long pfn, unsigned long size,
> > > + pgprot_t pgprot)
> > > +{
> > > + action->type = MMAP_REMAP_PFN;
> > > +
> > > + action->remap.addr = addr;
> > > + action->remap.pfn = pfn;
> > > + action->remap.size = size;
> > > + action->remap.pgprot = pgprot;
> > > +}
> >
> > These helpers drivers are supposed to call really should have kdocs.
> >
> > Especially since 'addr' is sort of ambigous.
>
> OK.
>
> >
> > And I'm wondering why they don't take in the vm_area_desc? Eg shouldn't
> > we be strongly discouraging using anything other than
> > vma->vm_page_prot as the last argument?
>
> I need to abstract desc from action so custom handlers can perform
> sub-actions. It's unfortunate but there we go.
Why? I don't see this as required
Just mark the functions as manipulating the action using the 'action'
in the fuction name.
> > I'd probably also have a small helper wrapper for the very common case
> > of whole vma:
> >
> > /* Fill the entire VMA with pfns starting at pfn. Caller must have
> > * already checked desc has an appropriate size */
> > mmap_action_remap_full(struct vm_area_desc *desc, unsigned long pfn)
>
> See above re: desc vs. action.
Yet, this is the API most places actually want.
> It'd be hard to know how to get the context right that'd need to be supplied to
> the callback.
>
> In kcov's case it'd be kcov->area + an offset.
Just use pgoff
> So we'd need an offset parameter, the struct file *, whatever else to be
> passed.
Yes
> And then we'll find a driver where that doesn't work and we're screwed.
Bah, you keep saying that but we also may never even find one.
Jason
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-15 12:42 ` Jason Gunthorpe
@ 2025-09-15 12:54 ` Lorenzo Stoakes
2025-09-15 13:11 ` Jason Gunthorpe
0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 12:54 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 09:42:59AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 15, 2025 at 01:23:30PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Sep 15, 2025 at 09:11:12AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote:
> > > > +static inline void mmap_action_remap(struct mmap_action *action,
> > > > + unsigned long addr, unsigned long pfn, unsigned long size,
> > > > + pgprot_t pgprot)
> > > > +{
> > > > + action->type = MMAP_REMAP_PFN;
> > > > +
> > > > + action->remap.addr = addr;
> > > > + action->remap.pfn = pfn;
> > > > + action->remap.size = size;
> > > > + action->remap.pgprot = pgprot;
> > > > +}
> > >
> > > These helpers drivers are supposed to call really should have kdocs.
> > >
> > > Especially since 'addr' is sort of ambigous.
> >
> > OK.
> >
> > >
> > > And I'm wondering why they don't take in the vm_area_desc? Eg shouldn't
> > > we be strongly discouraging using anything other than
> > > vma->vm_page_prot as the last argument?
> >
> > I need to abstract desc from action so custom handlers can perform
> > sub-actions. It's unfortunate but there we go.
>
> Why? I don't see this as required
>
> Just mark the functions as manipulating the action using the 'action'
> in the fuction name.
Because now sub-callers that partially map using one method and partially map
using another now need to have a desc too that they have to 'just know' which
fields to update or artificially set up.
The vmcore case does something like this.
Instead, we have actions where it's 100% clear what's going to happen.
>
> > > I'd probably also have a small helper wrapper for the very common case
> > > of whole vma:
> > >
> > > /* Fill the entire VMA with pfns starting at pfn. Caller must have
> > > * already checked desc has an appropriate size */
> > > mmap_action_remap_full(struct vm_area_desc *desc, unsigned long pfn)
> >
> > See above re: desc vs. action.
>
> Yet, this is the API most places actually want.
>
> > It'd be hard to know how to get the context right that'd need to be supplied to
> > the callback.
> >
> > In kcov's case it'd be kcov->area + an offset.
>
> Just use pgoff
>
> > So we'd need an offset parameter, the struct file *, whatever else to be
> > passed.
>
> Yes
>
> > And then we'll find a driver where that doesn't work and we're screwed.
>
> Bah, you keep saying that but we also may never even find one.
OK let me try something like this, then. I guess I can update it later if
we discover such a dirver.
>
> Jason
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-15 12:54 ` Lorenzo Stoakes
@ 2025-09-15 13:11 ` Jason Gunthorpe
2025-09-15 13:51 ` Lorenzo Stoakes
0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-09-15 13:11 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 01:54:05PM +0100, Lorenzo Stoakes wrote:
> > Just mark the functions as manipulating the action using the 'action'
> > in the fuction name.
>
> Because now sub-callers that partially map using one method and partially map
> using another now need to have a desc too that they have to 'just know' which
> fields to update or artificially set up.
Huh? There is only on desc->action, how can you have more than one
action with this scheme?
One action is the right thing anyhow, we can't meaningfully mix
different action types in the same VMA. That's nonsense.
You may need more flexible ways to get the address lists down the road
because not every driver will be contiguous, but that should still be
one action.
> The vmcore case does something like this.
vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap
helpers just aren't good for what mixedmap needs.. Mixed map need a
list of physical pfns with a bit indicating if they are "special" or
not. If you do it with a callback or a kmalloc allocation it doesn't
matter.
vmcore would then populate that list with its mixture of special and
non-sepcial memory and do a single mixedmem action.
I think this series should drop the mixedmem stuff, it is the most
complicated action type. A vmalloc_user action is better for kcov.
And maybe that is just a comment overall. This would be nicer if each
series focused on adding one action with a three-four mmap users
converted to use it as an example case.
Eg there are not that many places calling vmalloc_user(), a single
series could convert alot of them.
If you did it this way we'd discover that there are already
helpers for vmalloc_user():
return remap_vmalloc_range(vma, mdev_state->memblk, 0);
And kcov looks buggy to not be using it already. The above gets the
VMA type right and doesn't force mixedmap :)
Then the series goals are a bit better we can actually fully convert
and remove things like remap_vmalloc_range() in single series. That
looks feasible to me.
Jason
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-15 13:11 ` Jason Gunthorpe
@ 2025-09-15 13:51 ` Lorenzo Stoakes
2025-09-15 14:34 ` Jason Gunthorpe
0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 13:51 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 10:11:42AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 15, 2025 at 01:54:05PM +0100, Lorenzo Stoakes wrote:
> > > Just mark the functions as manipulating the action using the 'action'
> > > in the fuction name.
> >
> > Because now sub-callers that partially map using one method and partially map
> > using another now need to have a desc too that they have to 'just know' which
> > fields to update or artificially set up.
>
> Huh? There is only on desc->action, how can you have more than one
> action with this scheme?
Because you use a custom hook that can in turn perform actions? As I've
implemented for vmcore?
>
> One action is the right thing anyhow, we can't meaningfully mix
> different action types in the same VMA. That's nonsense.
OK, except that's how 'true' mixed maps work though right? As vmcore is doing?
>
> You may need more flexible ways to get the address lists down the road
> because not every driver will be contiguous, but that should still be
> one action.
>
> > The vmcore case does something like this.
>
> vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap
> helpers just aren't good for what mixedmap needs.. Mixed map need a
> list of physical pfns with a bit indicating if they are "special" or
> not. If you do it with a callback or a kmalloc allocation it doesn't
> matter.
Well it's a mix of actions to accomodate PFNs and normal pages as
implemented via a custom hook that can invoke each.
>
> vmcore would then populate that list with its mixture of special and
> non-sepcial memory and do a single mixedmem action.
I'm confused as to why you say a helper would be no good here, then go on
to delineate how a helper could work...
>
> I think this series should drop the mixedmem stuff, it is the most
> complicated action type. A vmalloc_user action is better for kcov.
Fine, I mean if we could find a way to explicitly just give a list of stuff
to map that'd be _great_ vs. having a custom hook.
If we can avoid custom hooks altogether that'd be ideal.
Anyway I'll drop the mixed map stuff, fine.
>
> And maybe that is just a comment overall. This would be nicer if each
> series focused on adding one action with a three-four mmap users
> converted to use it as an example case.
In future series I'll try to group by the action type.
This series is _setting up this to be a possibility at all_.
The idea was that I could put fundamentals in that should cover most cases,
I could then go on to implement them in (relative) peace...
I mean once I drop the mixed map stuff, and refactor to vmalloc_user(),
then we are pretty much doing that, modulo a single vmalloc_user() case.
So maybe I should drop the vmalloc_user() bits too and make this a
remap-only change...
But I don't want to tackle _all_ remap cases here.
I want to add this functionality in and have it ready for next cycle (yeah
not so sure about that now...) so I can then do follow up work.
Am trying to do it before Kernel Recipes which I'll be at and then a (very
very very needed) couple weeks vacaation.
Anyway maybe if I simplify there's still a shot at this landing in time...
>
> Eg there are not that many places calling vmalloc_user(), a single
> series could convert alot of them.
>
> If you did it this way we'd discover that there are already
> helpers for vmalloc_user():
>
> return remap_vmalloc_range(vma, mdev_state->memblk, 0);
>
> And kcov looks buggy to not be using it already. The above gets the
> VMA type right and doesn't force mixedmap :)
Right, I mean maybe.
If I can take care of low hanging fruit relatively easily then maybe it'll
be more practical to refactor the 'odd ones out'.
>
> Then the series goals are a bit better we can actually fully convert
> and remove things like remap_vmalloc_range() in single series. That
> looks feasible to me.
Right.
I'd love to drop unused stuff earlier, so _that_ is not an unreasonable
requirement.
>
> Jason
I guess I'll do a respin then as per above.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-15 13:51 ` Lorenzo Stoakes
@ 2025-09-15 14:34 ` Jason Gunthorpe
2025-09-15 15:04 ` Lorenzo Stoakes
0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-09-15 14:34 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 02:51:52PM +0100, Lorenzo Stoakes wrote:
> > vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap
> > helpers just aren't good for what mixedmap needs.. Mixed map need a
> > list of physical pfns with a bit indicating if they are "special" or
> > not. If you do it with a callback or a kmalloc allocation it doesn't
> > matter.
>
> Well it's a mix of actions to accomodate PFNs and normal pages as
> implemented via a custom hook that can invoke each.
No it's not a mix of actions. The mixedmap helpers are just
wrong for actual mixedmap usage:
+static inline void mmap_action_remap(struct mmap_action *action,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ pgprot_t pgprot)
+
+static inline void mmap_action_mixedmap(struct mmap_action *action,
+ unsigned long addr, unsigned long pfn, unsigned long num_pages)
Mixed map is a list of PFNs and a flag if the PFN is special or
not. That's what makes mixed map different from the other mapping
cases.
One action per VMA, and mixed map is handled by supporting the above
lis tin some way.
> > I think this series should drop the mixedmem stuff, it is the most
> > complicated action type. A vmalloc_user action is better for kcov.
>
> Fine, I mean if we could find a way to explicitly just give a list of stuff
> to map that'd be _great_ vs. having a custom hook.
You already proposed to allocate memory to hold an array, I suggested
to have a per-range callback. Either could work as an API for
mixedmap.
> So maybe I should drop the vmalloc_user() bits too and make this a
> remap-only change...
Sure
> But I don't want to tackle _all_ remap cases here.
Due 4-5 or something to show the API is working. Things like my remark
to have a better helper that does whole-vma only should show up more
clearly with a few more conversions.
It is generally a good idea when doing these reworks to look across
all the use cases patterns and try to simplify them. This is why a
series per pattern is a good idea because you are saying you found a
pattern, and here are N examples of the pattern to prove it.
Eg if a huge number of drivers are just mmaping a linear range of
memory with a fixed pgoff then a helper to support exactly that
pattern with minimal driver code should be developed.
Like below, apparently vmalloc_user() is already a pattern and already
has a simplifying safe helper.
> Anyway maybe if I simplify there's still a shot at this landing in time...
Simplify is always good to help things get merged :)
> > Eg there are not that many places calling vmalloc_user(), a single
> > series could convert alot of them.
> >
> > If you did it this way we'd discover that there are already
> > helpers for vmalloc_user():
> >
> > return remap_vmalloc_range(vma, mdev_state->memblk, 0);
> >
> > And kcov looks buggy to not be using it already. The above gets the
> > VMA type right and doesn't force mixedmap :)
>
> Right, I mean maybe.
Maybe send out a single patch to change kcov to remap_vmalloc_range()
for this cycle? Answer the maybe?
Jason
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
2025-09-15 14:34 ` Jason Gunthorpe
@ 2025-09-15 15:04 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 15:04 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 11:34:14AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 15, 2025 at 02:51:52PM +0100, Lorenzo Stoakes wrote:
> > > vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap
> > > helpers just aren't good for what mixedmap needs.. Mixed map need a
> > > list of physical pfns with a bit indicating if they are "special" or
> > > not. If you do it with a callback or a kmalloc allocation it doesn't
> > > matter.
> >
> > Well it's a mix of actions to accomodate PFNs and normal pages as
> > implemented via a custom hook that can invoke each.
>
> No it's not a mix of actions. The mixedmap helpers are just
> wrong for actual mixedmap usage:
>
> +static inline void mmap_action_remap(struct mmap_action *action,
> + unsigned long addr, unsigned long pfn, unsigned long size,
> + pgprot_t pgprot)
> +
> +static inline void mmap_action_mixedmap(struct mmap_action *action,
> + unsigned long addr, unsigned long pfn, unsigned long num_pages)
>
> Mixed map is a list of PFNs and a flag if the PFN is special or
> not. That's what makes mixed map different from the other mapping
> cases.
>
> One action per VMA, and mixed map is handled by supporting the above
> lis tin some way.
I don't think any of the above is really useful for me to respond to, I
think you've misunderstood what I'm saying, but it doesn't really matter
because I agree that the interface you propose is better for mixed map.
>
> > > I think this series should drop the mixedmem stuff, it is the most
> > > complicated action type. A vmalloc_user action is better for kcov.
> >
> > Fine, I mean if we could find a way to explicitly just give a list of stuff
> > to map that'd be _great_ vs. having a custom hook.
>
> You already proposed to allocate memory to hold an array, I suggested
> to have a per-range callback. Either could work as an API for
> mixedmap.
Again, I think you've misunderstood me, but it's moot, because I agree,
this kind of interface is better.
>
> > So maybe I should drop the vmalloc_user() bits too and make this a
> > remap-only change...
>
> Sure
>
> > But I don't want to tackle _all_ remap cases here.
>
> Due 4-5 or something to show the API is working. Things like my remark
> to have a better helper that does whole-vma only should show up more
> clearly with a few more conversions.
I was trying to limit to mm or mm-adjacent as per the cover letter.
But sure I will do that.
>
> It is generally a good idea when doing these reworks to look across
It's not a rework :) cover letter describes why I'm doing this.
> all the use cases patterns and try to simplify them. This is why a
> series per pattern is a good idea because you are saying you found a
> pattern, and here are N examples of the pattern to prove it.
>
> Eg if a huge number of drivers are just mmaping a linear range of
> memory with a fixed pgoff then a helper to support exactly that
> pattern with minimal driver code should be developed.
Fine in spirit, let's be pragmatic also though.
Again this isn't a refactoring exercise. But I agree we should try to get
the API right as best we can.
>
> Like below, apparently vmalloc_user() is already a pattern and already
> has a simplifying safe helper.
>
> > Anyway maybe if I simplify there's still a shot at this landing in time...
>
> Simplify is always good to help things get merged :)
Yup :)
>
> > > Eg there are not that many places calling vmalloc_user(), a single
> > > series could convert alot of them.
> > >
> > > If you did it this way we'd discover that there are already
> > > helpers for vmalloc_user():
> > >
> > > return remap_vmalloc_range(vma, mdev_state->memblk, 0);
> > >
> > > And kcov looks buggy to not be using it already. The above gets the
> > > VMA type right and doesn't force mixedmap :)
> >
> > Right, I mean maybe.
>
> Maybe send out a single patch to change kcov to remap_vmalloc_range()
> for this cycle? Answer the maybe?
Sure I can probably do that.
The question is time and, because most of my days are full of review as per
my self-inflicted^W -selected role as a maintainer.
This series will be the priority obviously :)
>
> Jason
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 09/16] doc: update porting, vfs documentation for mmap_prepare actions
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (7 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-11 8:55 ` Jan Kara
2025-09-10 20:22 ` [PATCH v2 10/16] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
` (7 subsequent siblings)
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Now we have introduced the ability to specify that actions should be taken
after a VMA is established via the vm_area_desc->action field as specified
in mmap_prepare, update both the VFS documentation and the porting guide to
describe this.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
Documentation/filesystems/porting.rst | 5 +++++
Documentation/filesystems/vfs.rst | 4 ++++
2 files changed, 9 insertions(+)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 85f590254f07..6743ed0b9112 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1285,3 +1285,8 @@ rather than a VMA, as the VMA at this stage is not yet valid.
The vm_area_desc provides the minimum required information for a filesystem
to initialise state upon memory mapping of a file-backed region, and output
parameters for the file system to set this state.
+
+In nearly all cases, this is all that is required for a filesystem. However, if
+a filesystem needs to perform an operation such a pre-population of page tables,
+then that action can be specified in the vm_area_desc->action field, which can
+be configured using the mmap_action_*() helpers.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 486a91633474..9e96c46ee10e 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1236,6 +1236,10 @@ otherwise noted.
file-backed memory mapping, most notably establishing relevant
private state and VMA callbacks.
+ If further action such as pre-population of page tables is required,
+ this can be specified by the vm_area_desc->action field and related
+ parameters.
+
Note that the file operations are implemented by the specific
filesystem in which the inode resides. When opening a device node
(character or block special) most filesystems will call special
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 09/16] doc: update porting, vfs documentation for mmap_prepare actions
2025-09-10 20:22 ` [PATCH v2 09/16] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
@ 2025-09-11 8:55 ` Jan Kara
2025-09-12 10:19 ` Lorenzo Stoakes
0 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2025-09-11 8:55 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Wed 10-09-25 21:22:04, Lorenzo Stoakes wrote:
> Now we have introduced the ability to specify that actions should be taken
> after a VMA is established via the vm_area_desc->action field as specified
> in mmap_prepare, update both the VFS documentation and the porting guide to
> describe this.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> Documentation/filesystems/porting.rst | 5 +++++
> Documentation/filesystems/vfs.rst | 4 ++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index 85f590254f07..6743ed0b9112 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -1285,3 +1285,8 @@ rather than a VMA, as the VMA at this stage is not yet valid.
> The vm_area_desc provides the minimum required information for a filesystem
> to initialise state upon memory mapping of a file-backed region, and output
> parameters for the file system to set this state.
> +
> +In nearly all cases, this is all that is required for a filesystem. However, if
> +a filesystem needs to perform an operation such a pre-population of page tables,
> +then that action can be specified in the vm_area_desc->action field, which can
> +be configured using the mmap_action_*() helpers.
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 486a91633474..9e96c46ee10e 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -1236,6 +1236,10 @@ otherwise noted.
> file-backed memory mapping, most notably establishing relevant
> private state and VMA callbacks.
>
> + If further action such as pre-population of page tables is required,
> + this can be specified by the vm_area_desc->action field and related
> + parameters.
> +
> Note that the file operations are implemented by the specific
> filesystem in which the inode resides. When opening a device node
> (character or block special) most filesystems will call special
> --
> 2.51.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 09/16] doc: update porting, vfs documentation for mmap_prepare actions
2025-09-11 8:55 ` Jan Kara
@ 2025-09-12 10:19 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-12 10:19 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Thu, Sep 11, 2025 at 10:55:42AM +0200, Jan Kara wrote:
> On Wed 10-09-25 21:22:04, Lorenzo Stoakes wrote:
> > Now we have introduced the ability to specify that actions should be taken
> > after a VMA is established via the vm_area_desc->action field as specified
> > in mmap_prepare, update both the VFS documentation and the porting guide to
> > describe this.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Looks good. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
Thanks for this and all previous tags! :)
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 10/16] mm/hugetlbfs: update hugetlbfs to use mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (8 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 09/16] doc: update porting, vfs documentation for mmap_prepare actions Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 11/16] mm: update mem char driver " Lorenzo Stoakes
` (6 subsequent siblings)
16 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Since we can now perform actions after the VMA is established via
mmap_prepare, use desc->action_success_hook to set up the hugetlb lock once
the VMA is setup.
We also make changes throughout hugetlbfs to make this possible.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
fs/hugetlbfs/inode.c | 30 +++++++------
include/linux/hugetlb.h | 9 +++-
include/linux/hugetlb_inline.h | 15 ++++---
mm/hugetlb.c | 77 ++++++++++++++++++++--------------
4 files changed, 79 insertions(+), 52 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3cfdf4091001..026bcc65bb79 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -96,8 +96,9 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = {
#define PGOFF_LOFFT_MAX \
(((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1)))
-static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+static int hugetlbfs_file_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *file = desc->file;
struct inode *inode = file_inode(file);
loff_t len, vma_len;
int ret;
@@ -112,8 +113,8 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* way when do_mmap unwinds (may be important on powerpc
* and ia64).
*/
- vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND);
- vma->vm_ops = &hugetlb_vm_ops;
+ desc->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
+ desc->vm_ops = &hugetlb_vm_ops;
/*
* page based offset in vm_pgoff could be sufficiently large to
@@ -122,16 +123,16 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* sizeof(unsigned long). So, only check in those instances.
*/
if (sizeof(unsigned long) == sizeof(loff_t)) {
- if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
+ if (desc->pgoff & PGOFF_LOFFT_MAX)
return -EINVAL;
}
/* must be huge page aligned */
- if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
+ if (desc->pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
return -EINVAL;
- vma_len = (loff_t)(vma->vm_end - vma->vm_start);
- len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ vma_len = (loff_t)vma_desc_size(desc);
+ len = vma_len + ((loff_t)desc->pgoff << PAGE_SHIFT);
/* check for overflow */
if (len < vma_len)
return -EINVAL;
@@ -141,7 +142,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
ret = -ENOMEM;
- vm_flags = vma->vm_flags;
+ vm_flags = desc->vm_flags;
/*
* for SHM_HUGETLB, the pages are reserved in the shmget() call so skip
* reserving here. Note: only for SHM hugetlbfs file, the inode
@@ -151,17 +152,20 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
vm_flags |= VM_NORESERVE;
if (hugetlb_reserve_pages(inode,
- vma->vm_pgoff >> huge_page_order(h),
- len >> huge_page_shift(h), vma,
- vm_flags) < 0)
+ desc->pgoff >> huge_page_order(h),
+ len >> huge_page_shift(h), desc,
+ vm_flags) < 0)
goto out;
ret = 0;
- if (vma->vm_flags & VM_WRITE && inode->i_size < len)
+ if ((desc->vm_flags & VM_WRITE) && inode->i_size < len)
i_size_write(inode, len);
out:
inode_unlock(inode);
+ /* Allocate the VMA lock after we set it up. */
+ if (!ret)
+ desc->action.success_hook = hugetlb_vma_lock_alloc;
return ret;
}
@@ -1219,7 +1223,7 @@ static void init_once(void *foo)
static const struct file_operations hugetlbfs_file_operations = {
.read_iter = hugetlbfs_read_iter,
- .mmap = hugetlbfs_file_mmap,
+ .mmap_prepare = hugetlbfs_file_mmap_prepare,
.fsync = noop_fsync,
.get_unmapped_area = hugetlb_get_unmapped_area,
.llseek = default_llseek,
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 526d27e88b3b..b39f2b70ccab 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -150,8 +150,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
struct folio **foliop);
#endif /* CONFIG_USERFAULTFD */
long hugetlb_reserve_pages(struct inode *inode, long from, long to,
- struct vm_area_struct *vma,
- vm_flags_t vm_flags);
+ struct vm_area_desc *desc, vm_flags_t vm_flags);
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
long freed);
bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list);
@@ -280,6 +279,7 @@ bool is_hugetlb_entry_hwpoisoned(pte_t pte);
void hugetlb_unshare_all_pmds(struct vm_area_struct *vma);
void fixup_hugetlb_reservations(struct vm_area_struct *vma);
void hugetlb_split(struct vm_area_struct *vma, unsigned long addr);
+int hugetlb_vma_lock_alloc(struct vm_area_struct *vma);
#else /* !CONFIG_HUGETLB_PAGE */
@@ -466,6 +466,11 @@ static inline void fixup_hugetlb_reservations(struct vm_area_struct *vma)
static inline void hugetlb_split(struct vm_area_struct *vma, unsigned long addr) {}
+static inline int hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
+{
+ return 0;
+}
+
#endif /* !CONFIG_HUGETLB_PAGE */
#ifndef pgd_write
diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
index 0660a03d37d9..a27aa0162918 100644
--- a/include/linux/hugetlb_inline.h
+++ b/include/linux/hugetlb_inline.h
@@ -2,22 +2,27 @@
#ifndef _LINUX_HUGETLB_INLINE_H
#define _LINUX_HUGETLB_INLINE_H
-#ifdef CONFIG_HUGETLB_PAGE
-
#include <linux/mm.h>
-static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
+#ifdef CONFIG_HUGETLB_PAGE
+
+static inline bool is_vm_hugetlb_flags(vm_flags_t vm_flags)
{
- return !!(vma->vm_flags & VM_HUGETLB);
+ return !!(vm_flags & VM_HUGETLB);
}
#else
-static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
+static inline bool is_vm_hugetlb_flags(vm_flags_t vm_flags)
{
return false;
}
#endif
+static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
+{
+ return is_vm_hugetlb_flags(vma->vm_flags);
+}
+
#endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d812ad8f0b9f..cb6eda43cb7f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -119,7 +119,6 @@ struct mutex *hugetlb_fault_mutex_table __ro_after_init;
/* Forward declaration */
static int hugetlb_acct_memory(struct hstate *h, long delta);
static void hugetlb_vma_lock_free(struct vm_area_struct *vma);
-static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma);
static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma);
static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
unsigned long start, unsigned long end, bool take_locks);
@@ -417,17 +416,21 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
}
}
-static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
+/*
+ * vma specific semaphore used for pmd sharing and fault/truncation
+ * synchronization
+ */
+int hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
{
struct hugetlb_vma_lock *vma_lock;
/* Only establish in (flags) sharable vmas */
if (!vma || !(vma->vm_flags & VM_MAYSHARE))
- return;
+ return 0;
/* Should never get here with non-NULL vm_private_data */
if (vma->vm_private_data)
- return;
+ return -EINVAL;
vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL);
if (!vma_lock) {
@@ -442,13 +445,15 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
* allocation failure.
*/
pr_warn_once("HugeTLB: unable to allocate vma specific lock\n");
- return;
+ return -EINVAL;
}
kref_init(&vma_lock->refs);
init_rwsem(&vma_lock->rw_sema);
vma_lock->vma = vma;
vma->vm_private_data = vma_lock;
+
+ return 0;
}
/* Helper that removes a struct file_region from the resv_map cache and returns
@@ -1180,20 +1185,28 @@ static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
}
}
-static void set_vma_resv_map(struct vm_area_struct *vma, struct resv_map *map)
+static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
{
- VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma);
- VM_BUG_ON_VMA(vma->vm_flags & VM_MAYSHARE, vma);
+ VM_WARN_ON_ONCE_VMA(!is_vm_hugetlb_page(vma), vma);
+ VM_WARN_ON_ONCE_VMA(vma->vm_flags & VM_MAYSHARE, vma);
- set_vma_private_data(vma, (unsigned long)map);
+ set_vma_private_data(vma, get_vma_private_data(vma) | flags);
}
-static void set_vma_resv_flags(struct vm_area_struct *vma, unsigned long flags)
+static void set_vma_desc_resv_map(struct vm_area_desc *desc, struct resv_map *map)
{
- VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma);
- VM_BUG_ON_VMA(vma->vm_flags & VM_MAYSHARE, vma);
+ VM_WARN_ON_ONCE(!is_vm_hugetlb_flags(desc->vm_flags));
+ VM_WARN_ON_ONCE(desc->vm_flags & VM_MAYSHARE);
- set_vma_private_data(vma, get_vma_private_data(vma) | flags);
+ desc->private_data = map;
+}
+
+static void set_vma_desc_resv_flags(struct vm_area_desc *desc, unsigned long flags)
+{
+ VM_WARN_ON_ONCE(!is_vm_hugetlb_flags(desc->vm_flags));
+ VM_WARN_ON_ONCE(desc->vm_flags & VM_MAYSHARE);
+
+ desc->private_data = (void *)((unsigned long)desc->private_data | flags);
}
static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
@@ -1203,6 +1216,13 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag)
return (get_vma_private_data(vma) & flag) != 0;
}
+static bool is_vma_desc_resv_set(struct vm_area_desc *desc, unsigned long flag)
+{
+ VM_WARN_ON_ONCE(!is_vm_hugetlb_flags(desc->vm_flags));
+
+ return ((unsigned long)desc->private_data) & flag;
+}
+
bool __vma_private_lock(struct vm_area_struct *vma)
{
return !(vma->vm_flags & VM_MAYSHARE) &&
@@ -7225,9 +7245,9 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
*/
long hugetlb_reserve_pages(struct inode *inode,
- long from, long to,
- struct vm_area_struct *vma,
- vm_flags_t vm_flags)
+ long from, long to,
+ struct vm_area_desc *desc,
+ vm_flags_t vm_flags)
{
long chg = -1, add = -1, spool_resv, gbl_resv;
struct hstate *h = hstate_inode(inode);
@@ -7242,12 +7262,6 @@ long hugetlb_reserve_pages(struct inode *inode,
return -EINVAL;
}
- /*
- * vma specific semaphore used for pmd sharing and fault/truncation
- * synchronization
- */
- hugetlb_vma_lock_alloc(vma);
-
/*
* Only apply hugepage reservation if asked. At fault time, an
* attempt will be made for VM_NORESERVE to allocate a page
@@ -7260,9 +7274,9 @@ long hugetlb_reserve_pages(struct inode *inode,
* Shared mappings base their reservation on the number of pages that
* are already allocated on behalf of the file. Private mappings need
* to reserve the full area even if read-only as mprotect() may be
- * called to make the mapping read-write. Assume !vma is a shm mapping
+ * called to make the mapping read-write. Assume !desc is a shm mapping
*/
- if (!vma || vma->vm_flags & VM_MAYSHARE) {
+ if (!desc || desc->vm_flags & VM_MAYSHARE) {
/*
* resv_map can not be NULL as hugetlb_reserve_pages is only
* called for inodes for which resv_maps were created (see
@@ -7279,8 +7293,8 @@ long hugetlb_reserve_pages(struct inode *inode,
chg = to - from;
- set_vma_resv_map(vma, resv_map);
- set_vma_resv_flags(vma, HPAGE_RESV_OWNER);
+ set_vma_desc_resv_map(desc, resv_map);
+ set_vma_desc_resv_flags(desc, HPAGE_RESV_OWNER);
}
if (chg < 0)
@@ -7290,7 +7304,7 @@ long hugetlb_reserve_pages(struct inode *inode,
chg * pages_per_huge_page(h), &h_cg) < 0)
goto out_err;
- if (vma && !(vma->vm_flags & VM_MAYSHARE) && h_cg) {
+ if (desc && !(desc->vm_flags & VM_MAYSHARE) && h_cg) {
/* For private mappings, the hugetlb_cgroup uncharge info hangs
* of the resv_map.
*/
@@ -7324,7 +7338,7 @@ long hugetlb_reserve_pages(struct inode *inode,
* consumed reservations are stored in the map. Hence, nothing
* else has to be done for private mappings here
*/
- if (!vma || vma->vm_flags & VM_MAYSHARE) {
+ if (!desc || desc->vm_flags & VM_MAYSHARE) {
add = region_add(resv_map, from, to, regions_needed, h, h_cg);
if (unlikely(add < 0)) {
@@ -7378,16 +7392,15 @@ long hugetlb_reserve_pages(struct inode *inode,
hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
chg * pages_per_huge_page(h), h_cg);
out_err:
- hugetlb_vma_lock_free(vma);
- if (!vma || vma->vm_flags & VM_MAYSHARE)
+ if (!desc || desc->vm_flags & VM_MAYSHARE)
/* Only call region_abort if the region_chg succeeded but the
* region_add failed or didn't run.
*/
if (chg >= 0 && add < 0)
region_abort(resv_map, from, to, regions_needed);
- if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+ if (desc && is_vma_desc_resv_set(desc, HPAGE_RESV_OWNER)) {
kref_put(&resv_map->refs, resv_map_release);
- set_vma_resv_map(vma, NULL);
+ set_vma_desc_resv_map(desc, NULL);
}
return chg < 0 ? chg : add < 0 ? add : -EINVAL;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* [PATCH v2 11/16] mm: update mem char driver to use mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (9 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 10/16] mm/hugetlbfs: update hugetlbfs to use mmap_prepare Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-18 19:11 ` Chris Mason
2025-09-10 20:22 ` [PATCH v2 12/16] mm: update resctl " Lorenzo Stoakes
` (5 subsequent siblings)
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Update the mem char driver (backing /dev/mem and /dev/zero) to use
f_op->mmap_prepare hook rather than the deprecated f_op->mmap.
The /dev/zero implementation has a very unique and rather concerning
characteristic in that it converts MAP_PRIVATE mmap() mappings anonymous
when they are, in fact, not.
The new f_op->mmap_prepare() can support this, but rather than introducing
a helper function to perform this hack (and risk introducing other users),
simply set desc->vm_op to NULL here and add a comment describing what's
going on.
We also introduce shmem_zero_setup_desc() to allow for the shared mapping
case via an f_op->mmap_prepare() hook, and generalise the code between this
and shmem_zero_setup().
We also use the desc->action_error_hook to filter the remap error to
-EAGAIN to keep behaviour consistent.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
drivers/char/mem.c | 75 ++++++++++++++++++++++------------------
include/linux/shmem_fs.h | 3 +-
mm/shmem.c | 40 ++++++++++++++++-----
3 files changed, 76 insertions(+), 42 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 34b815901b20..23194788ee41 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -304,13 +304,13 @@ static unsigned zero_mmap_capabilities(struct file *file)
}
/* can't do an in-place private mapping if there's no MMU */
-static inline int private_mapping_ok(struct vm_area_struct *vma)
+static inline int private_mapping_ok(struct vm_area_desc *desc)
{
- return is_nommu_shared_mapping(vma->vm_flags);
+ return is_nommu_shared_mapping(desc->vm_flags);
}
#else
-static inline int private_mapping_ok(struct vm_area_struct *vma)
+static inline int private_mapping_ok(struct vm_area_desc *desc)
{
return 1;
}
@@ -322,46 +322,50 @@ static const struct vm_operations_struct mmap_mem_ops = {
#endif
};
-static int mmap_mem(struct file *file, struct vm_area_struct *vma)
+static int mmap_filter_error(int err)
{
- size_t size = vma->vm_end - vma->vm_start;
- phys_addr_t offset = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
+ return -EAGAIN;
+}
+
+static int mmap_mem_prepare(struct vm_area_desc *desc)
+{
+ struct file *file = desc->file;
+ const size_t size = vma_desc_size(desc);
+ const phys_addr_t offset = (phys_addr_t)desc->pgoff << PAGE_SHIFT;
/* Does it even fit in phys_addr_t? */
- if (offset >> PAGE_SHIFT != vma->vm_pgoff)
+ if (offset >> PAGE_SHIFT != desc->pgoff)
return -EINVAL;
/* It's illegal to wrap around the end of the physical address space. */
if (offset + (phys_addr_t)size - 1 < offset)
return -EINVAL;
- if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size))
+ if (!valid_mmap_phys_addr_range(desc->pgoff, size))
return -EINVAL;
- if (!private_mapping_ok(vma))
+ if (!private_mapping_ok(desc))
return -ENOSYS;
- if (!range_is_allowed(vma->vm_pgoff, size))
+ if (!range_is_allowed(desc->pgoff, size))
return -EPERM;
- if (!phys_mem_access_prot_allowed(file, vma->vm_pgoff, size,
- &vma->vm_page_prot))
+ if (!phys_mem_access_prot_allowed(file, desc->pgoff, size,
+ &desc->page_prot))
return -EINVAL;
- vma->vm_page_prot = phys_mem_access_prot(file, vma->vm_pgoff,
- size,
- vma->vm_page_prot);
+ desc->page_prot = phys_mem_access_prot(file, desc->pgoff,
+ size,
+ desc->page_prot);
- vma->vm_ops = &mmap_mem_ops;
+ desc->vm_ops = &mmap_mem_ops;
/* Remap-pfn-range will mark the range VM_IO */
- if (remap_pfn_range(vma,
- vma->vm_start,
- vma->vm_pgoff,
- size,
- vma->vm_page_prot)) {
- return -EAGAIN;
- }
+ mmap_action_remap(&desc->action, desc->start, desc->pgoff, size,
+ desc->page_prot);
+ /* We filter remap errors to -EAGAIN. */
+ desc->action.error_hook = mmap_filter_error;
+
return 0;
}
@@ -501,14 +505,18 @@ static ssize_t read_zero(struct file *file, char __user *buf,
return cleared;
}
-static int mmap_zero(struct file *file, struct vm_area_struct *vma)
+static int mmap_prepare_zero(struct vm_area_desc *desc)
{
#ifndef CONFIG_MMU
return -ENOSYS;
#endif
- if (vma->vm_flags & VM_SHARED)
- return shmem_zero_setup(vma);
- vma_set_anonymous(vma);
+ if (desc->vm_flags & VM_SHARED)
+ return shmem_zero_setup_desc(desc);
+ /*
+ * This is a highly unique situation where we mark a MAP_PRIVATE mapping
+ *of /dev/zero anonymous, despite it not being.
+ */
+ desc->vm_ops = NULL;
return 0;
}
@@ -526,10 +534,11 @@ static unsigned long get_unmapped_area_zero(struct file *file,
{
if (flags & MAP_SHARED) {
/*
- * mmap_zero() will call shmem_zero_setup() to create a file,
- * so use shmem's get_unmapped_area in case it can be huge;
- * and pass NULL for file as in mmap.c's get_unmapped_area(),
- * so as not to confuse shmem with our handle on "/dev/zero".
+ * mmap_prepare_zero() will call shmem_zero_setup() to create a
+ * file, so use shmem's get_unmapped_area in case it can be
+ * huge; and pass NULL for file as in mmap.c's
+ * get_unmapped_area(), so as not to confuse shmem with our
+ * handle on "/dev/zero".
*/
return shmem_get_unmapped_area(NULL, addr, len, pgoff, flags);
}
@@ -632,7 +641,7 @@ static const struct file_operations __maybe_unused mem_fops = {
.llseek = memory_lseek,
.read = read_mem,
.write = write_mem,
- .mmap = mmap_mem,
+ .mmap_prepare = mmap_mem_prepare,
.open = open_mem,
#ifndef CONFIG_MMU
.get_unmapped_area = get_unmapped_area_mem,
@@ -668,7 +677,7 @@ static const struct file_operations zero_fops = {
.write_iter = write_iter_zero,
.splice_read = copy_splice_read,
.splice_write = splice_write_zero,
- .mmap = mmap_zero,
+ .mmap_prepare = mmap_prepare_zero,
.get_unmapped_area = get_unmapped_area_zero,
#ifndef CONFIG_MMU
.mmap_capabilities = zero_mmap_capabilities,
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 0e47465ef0fd..5b368f9549d6 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -94,7 +94,8 @@ extern struct file *shmem_kernel_file_setup(const char *name, loff_t size,
unsigned long flags);
extern struct file *shmem_file_setup_with_mnt(struct vfsmount *mnt,
const char *name, loff_t size, unsigned long flags);
-extern int shmem_zero_setup(struct vm_area_struct *);
+int shmem_zero_setup(struct vm_area_struct *vma);
+int shmem_zero_setup_desc(struct vm_area_desc *desc);
extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts);
diff --git a/mm/shmem.c b/mm/shmem.c
index 990e33c6a776..cb6ff00eb4cb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -5893,14 +5893,9 @@ struct file *shmem_file_setup_with_mnt(struct vfsmount *mnt, const char *name,
}
EXPORT_SYMBOL_GPL(shmem_file_setup_with_mnt);
-/**
- * shmem_zero_setup - setup a shared anonymous mapping
- * @vma: the vma to be mmapped is prepared by do_mmap
- */
-int shmem_zero_setup(struct vm_area_struct *vma)
+static struct file *__shmem_zero_setup(unsigned long start, unsigned long end, vm_flags_t vm_flags)
{
- struct file *file;
- loff_t size = vma->vm_end - vma->vm_start;
+ loff_t size = end - start;
/*
* Cloning a new file under mmap_lock leads to a lock ordering conflict
@@ -5908,7 +5903,17 @@ int shmem_zero_setup(struct vm_area_struct *vma)
* accessible to the user through its mapping, use S_PRIVATE flag to
* bypass file security, in the same way as shmem_kernel_file_setup().
*/
- file = shmem_kernel_file_setup("dev/zero", size, vma->vm_flags);
+ return shmem_kernel_file_setup("dev/zero", size, vm_flags);
+}
+
+/**
+ * shmem_zero_setup - setup a shared anonymous mapping
+ * @vma: the vma to be mmapped is prepared by do_mmap
+ */
+int shmem_zero_setup(struct vm_area_struct *vma)
+{
+ struct file *file = __shmem_zero_setup(vma->vm_start, vma->vm_end, vma->vm_flags);
+
if (IS_ERR(file))
return PTR_ERR(file);
@@ -5920,6 +5925,25 @@ int shmem_zero_setup(struct vm_area_struct *vma)
return 0;
}
+/**
+ * shmem_zero_setup_desc - same as shmem_zero_setup, but determined by VMA
+ * descriptor for convenience.
+ * @desc: Describes VMA
+ * Returns: 0 on success, or error
+ */
+int shmem_zero_setup_desc(struct vm_area_desc *desc)
+{
+ struct file *file = __shmem_zero_setup(desc->start, desc->end, desc->vm_flags);
+
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ desc->vm_file = file;
+ desc->vm_ops = &shmem_anon_vm_ops;
+
+ return 0;
+}
+
/**
* shmem_read_folio_gfp - read into page cache, using specified page allocation flags.
* @mapping: the folio's address_space
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 11/16] mm: update mem char driver to use mmap_prepare
2025-09-10 20:22 ` [PATCH v2 11/16] mm: update mem char driver " Lorenzo Stoakes
@ 2025-09-18 19:11 ` Chris Mason
2025-09-19 5:13 ` Lorenzo Stoakes
0 siblings, 1 reply; 55+ messages in thread
From: Chris Mason @ 2025-09-18 19:11 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Chris Mason, Andrew Morton, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Wed, 10 Sep 2025 21:22:06 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> Update the mem char driver (backing /dev/mem and /dev/zero) to use
> f_op->mmap_prepare hook rather than the deprecated f_op->mmap.
>
> The /dev/zero implementation has a very unique and rather concerning
> characteristic in that it converts MAP_PRIVATE mmap() mappings anonymous
> when they are, in fact, not.
>
> The new f_op->mmap_prepare() can support this, but rather than introducing
> a helper function to perform this hack (and risk introducing other users),
> simply set desc->vm_op to NULL here and add a comment describing what's
> going on.
>
> We also introduce shmem_zero_setup_desc() to allow for the shared mapping
> case via an f_op->mmap_prepare() hook, and generalise the code between this
> and shmem_zero_setup().
>
> We also use the desc->action_error_hook to filter the remap error to
> -EAGAIN to keep behaviour consistent.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> drivers/char/mem.c | 75 ++++++++++++++++++++++------------------
> include/linux/shmem_fs.h | 3 +-
> mm/shmem.c | 40 ++++++++++++++++-----
> 3 files changed, 76 insertions(+), 42 deletions(-)
>
[ ... ]
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 990e33c6a776..cb6ff00eb4cb 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
[ ... ]
> @@ -5920,6 +5925,25 @@ int shmem_zero_setup(struct vm_area_struct *vma)
> return 0;
> }
>
> +/**
> + * shmem_zero_setup_desc - same as shmem_zero_setup, but determined by VMA
> + * descriptor for convenience.
> + * @desc: Describes VMA
> + * Returns: 0 on success, or error
> + */
> +int shmem_zero_setup_desc(struct vm_area_desc *desc)
> +{
> + struct file *file = __shmem_zero_setup(desc->start, desc->end, desc->vm_flags);
> +
> + if (IS_ERR(file))
> + return PTR_ERR(file);
> +
> + desc->vm_file = file;
> + desc->vm_ops = &shmem_anon_vm_ops;
> +
> + return 0;
> +}
> +
Hi Lorenzo,
shmem_zero_setup() does a if (vma->vm_file) fput(vma->vm_file) dance.
It looks like we need one here too?
-chris
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 11/16] mm: update mem char driver to use mmap_prepare
2025-09-18 19:11 ` Chris Mason
@ 2025-09-19 5:13 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-19 5:13 UTC (permalink / raw)
To: Chris Mason
Cc: Andrew Morton, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
On Thu, Sep 18, 2025 at 12:11:05PM -0700, Chris Mason wrote:
> On Wed, 10 Sep 2025 21:22:06 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > Update the mem char driver (backing /dev/mem and /dev/zero) to use
> > f_op->mmap_prepare hook rather than the deprecated f_op->mmap.
> >
> > The /dev/zero implementation has a very unique and rather concerning
> > characteristic in that it converts MAP_PRIVATE mmap() mappings anonymous
> > when they are, in fact, not.
> >
> > The new f_op->mmap_prepare() can support this, but rather than introducing
> > a helper function to perform this hack (and risk introducing other users),
> > simply set desc->vm_op to NULL here and add a comment describing what's
> > going on.
> >
> > We also introduce shmem_zero_setup_desc() to allow for the shared mapping
> > case via an f_op->mmap_prepare() hook, and generalise the code between this
> > and shmem_zero_setup().
> >
> > We also use the desc->action_error_hook to filter the remap error to
> > -EAGAIN to keep behaviour consistent.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > drivers/char/mem.c | 75 ++++++++++++++++++++++------------------
> > include/linux/shmem_fs.h | 3 +-
> > mm/shmem.c | 40 ++++++++++++++++-----
> > 3 files changed, 76 insertions(+), 42 deletions(-)
> >
>
> [ ... ]
>
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 990e33c6a776..cb6ff00eb4cb 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
>
> [ ... ]
>
> > @@ -5920,6 +5925,25 @@ int shmem_zero_setup(struct vm_area_struct *vma)
> > return 0;
> > }
> >
> > +/**
> > + * shmem_zero_setup_desc - same as shmem_zero_setup, but determined by VMA
> > + * descriptor for convenience.
> > + * @desc: Describes VMA
> > + * Returns: 0 on success, or error
> > + */
> > +int shmem_zero_setup_desc(struct vm_area_desc *desc)
> > +{
> > + struct file *file = __shmem_zero_setup(desc->start, desc->end, desc->vm_flags);
> > +
> > + if (IS_ERR(file))
> > + return PTR_ERR(file);
> > +
> > + desc->vm_file = file;
> > + desc->vm_ops = &shmem_anon_vm_ops;
> > +
> > + return 0;
> > +}
> > +
>
> Hi Lorenzo,
>
> shmem_zero_setup() does a if (vma->vm_file) fput(vma->vm_file) dance.
>
> It looks like we need one here too?
No we don't, it's intentionally designed to avoid this because mmap_prepare is
done at a time prior to the file pointer having had been pinned like this.
This is necessary in mmap() but not in mmap_prepare(), equally you can just
assign VMA flags or any other field without any need for special helpers or
lock/refcount dances etc.
>
> -chris
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 12/16] mm: update resctl to use mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (10 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 11/16] mm: update mem char driver " Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-11 22:07 ` Reinette Chatre
2025-09-10 20:22 ` [PATCH v2 13/16] mm: update cramfs " Lorenzo Stoakes
` (4 subsequent siblings)
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Make use of the ability to specify a remap action within mmap_prepare to
update the resctl pseudo-lock to use mmap_prepare in favour of the
deprecated mmap hook.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
fs/resctrl/pseudo_lock.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/fs/resctrl/pseudo_lock.c b/fs/resctrl/pseudo_lock.c
index 87bbc2605de1..e847df586766 100644
--- a/fs/resctrl/pseudo_lock.c
+++ b/fs/resctrl/pseudo_lock.c
@@ -995,10 +995,11 @@ static const struct vm_operations_struct pseudo_mmap_ops = {
.mremap = pseudo_lock_dev_mremap,
};
-static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
+static int pseudo_lock_dev_mmap_prepare(struct vm_area_desc *desc)
{
- unsigned long vsize = vma->vm_end - vma->vm_start;
- unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+ unsigned long off = desc->pgoff << PAGE_SHIFT;
+ unsigned long vsize = vma_desc_size(desc);
+ struct file *filp = desc->file;
struct pseudo_lock_region *plr;
struct rdtgroup *rdtgrp;
unsigned long physical;
@@ -1043,7 +1044,7 @@ static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
* Ensure changes are carried directly to the memory being mapped,
* do not allow copy-on-write mapping.
*/
- if (!(vma->vm_flags & VM_SHARED)) {
+ if (!(desc->vm_flags & VM_SHARED)) {
mutex_unlock(&rdtgroup_mutex);
return -EINVAL;
}
@@ -1055,12 +1056,11 @@ static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
memset(plr->kmem + off, 0, vsize);
- if (remap_pfn_range(vma, vma->vm_start, physical + vma->vm_pgoff,
- vsize, vma->vm_page_prot)) {
- mutex_unlock(&rdtgroup_mutex);
- return -EAGAIN;
- }
- vma->vm_ops = &pseudo_mmap_ops;
+ desc->vm_ops = &pseudo_mmap_ops;
+
+ mmap_action_remap(&desc->action, desc->start, physical + desc->pgoff,
+ vsize, desc->page_prot);
+
mutex_unlock(&rdtgroup_mutex);
return 0;
}
@@ -1071,7 +1071,7 @@ static const struct file_operations pseudo_lock_dev_fops = {
.write = NULL,
.open = pseudo_lock_dev_open,
.release = pseudo_lock_dev_release,
- .mmap = pseudo_lock_dev_mmap,
+ .mmap_prepare = pseudo_lock_dev_mmap_prepare,
};
int rdt_pseudo_lock_init(void)
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 12/16] mm: update resctl to use mmap_prepare
2025-09-10 20:22 ` [PATCH v2 12/16] mm: update resctl " Lorenzo Stoakes
@ 2025-09-11 22:07 ` Reinette Chatre
2025-09-12 10:14 ` Lorenzo Stoakes
0 siblings, 1 reply; 55+ messages in thread
From: Reinette Chatre @ 2025-09-11 22:07 UTC (permalink / raw)
To: Lorenzo Stoakes, Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
Hi Lorenzo,
On 9/10/25 1:22 PM, Lorenzo Stoakes wrote:
> Make use of the ability to specify a remap action within mmap_prepare to
> update the resctl pseudo-lock to use mmap_prepare in favour of the
> deprecated mmap hook.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
Thank you.
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
This does not conflict with any of the resctrl changes currently
being queued in tip tree for inclusion during next merge window.
Reinette
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 12/16] mm: update resctl to use mmap_prepare
2025-09-11 22:07 ` Reinette Chatre
@ 2025-09-12 10:14 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-12 10:14 UTC (permalink / raw)
To: Reinette Chatre
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Dave Martin, James Morse, Alexander Viro,
Christian Brauner, Jan Kara, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Hugh Dickins,
Baolin Wang, Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov,
Jann Horn, Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel,
linux-csky, linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl,
linux-mm, ntfs3, kexec, kasan-dev, Jason Gunthorpe
On Thu, Sep 11, 2025 at 03:07:43PM -0700, Reinette Chatre wrote:
> Hi Lorenzo,
>
> On 9/10/25 1:22 PM, Lorenzo Stoakes wrote:
> > Make use of the ability to specify a remap action within mmap_prepare to
> > update the resctl pseudo-lock to use mmap_prepare in favour of the
> > deprecated mmap hook.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
>
> Thank you.
>
> Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Thanks!
>
> This does not conflict with any of the resctrl changes currently
> being queued in tip tree for inclusion during next merge window.
Great :)
>
> Reinette
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH v2 13/16] mm: update cramfs to use mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (11 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 12/16] mm: update resctl " Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 14/16] fs/proc: add the proc_mmap_prepare hook for procfs Lorenzo Stoakes
` (3 subsequent siblings)
16 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
cramfs uses either a PFN remap or a mixedmap insertion, we are able to
determine this at the point of mmap_prepare and to select the appropriate
action to perform using the vm_area_desc.
Note that there appears to have been a bug in this code, with the physical
address being specified as the PFN (!!) to vmf_insert_mixed(). This patch
fixes this issue.
Finally, we trivially have to update the pr_debug() message indicating
what's happening to occur before the remap/mixedmap occurs.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
fs/cramfs/inode.c | 46 ++++++++++++++++++++--------------------------
1 file changed, 20 insertions(+), 26 deletions(-)
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index b002e9b734f9..2a41b30753a7 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -342,16 +342,17 @@ static bool cramfs_last_page_is_shared(struct inode *inode)
return memchr_inv(tail_data, 0, PAGE_SIZE - partial) ? true : false;
}
-static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
+static int cramfs_physmem_mmap_prepare(struct vm_area_desc *desc)
{
+ struct file *file = desc->file;
struct inode *inode = file_inode(file);
struct cramfs_sb_info *sbi = CRAMFS_SB(inode->i_sb);
unsigned int pages, max_pages, offset;
- unsigned long address, pgoff = vma->vm_pgoff;
+ unsigned long address, pgoff = desc->pgoff;
char *bailout_reason;
int ret;
- ret = generic_file_readonly_mmap(file, vma);
+ ret = generic_file_readonly_mmap_prepare(desc);
if (ret)
return ret;
@@ -362,14 +363,14 @@ static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
/* Could COW work here? */
bailout_reason = "vma is writable";
- if (vma->vm_flags & VM_WRITE)
+ if (desc->vm_flags & VM_WRITE)
goto bailout;
max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
bailout_reason = "beyond file limit";
if (pgoff >= max_pages)
goto bailout;
- pages = min(vma_pages(vma), max_pages - pgoff);
+ pages = min(vma_desc_pages(desc), max_pages - pgoff);
offset = cramfs_get_block_range(inode, pgoff, &pages);
bailout_reason = "unsuitable block layout";
@@ -391,38 +392,31 @@ static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
goto bailout;
}
- if (pages == vma_pages(vma)) {
+ pr_debug("mapping %pD[%lu] at 0x%08lx (%u/%lu pages) "
+ "to vma 0x%08lx, page_prot 0x%llx\n", file,
+ pgoff, address, pages, vma_desc_pages(desc), desc->start,
+ (unsigned long long)pgprot_val(desc->page_prot));
+
+ if (pages == vma_desc_pages(desc)) {
/*
* The entire vma is mappable. remap_pfn_range() will
* make it distinguishable from a non-direct mapping
* in /proc/<pid>/maps by substituting the file offset
* with the actual physical address.
*/
- ret = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
- pages * PAGE_SIZE, vma->vm_page_prot);
+ mmap_action_remap(&desc->action, desc->start,
+ address >> PAGE_SHIFT, pages * PAGE_SIZE,
+ desc->page_prot);
} else {
/*
* Let's create a mixed map if we can't map it all.
* The normal paging machinery will take care of the
* unpopulated ptes via cramfs_read_folio().
*/
- int i;
- vm_flags_set(vma, VM_MIXEDMAP);
- for (i = 0; i < pages && !ret; i++) {
- vm_fault_t vmf;
- unsigned long off = i * PAGE_SIZE;
- vmf = vmf_insert_mixed(vma, vma->vm_start + off,
- address + off);
- if (vmf & VM_FAULT_ERROR)
- ret = vm_fault_to_errno(vmf, 0);
- }
+ mmap_action_mixedmap(&desc->action, desc->start,
+ address >> PAGE_SHIFT, pages);
}
- if (!ret)
- pr_debug("mapped %pD[%lu] at 0x%08lx (%u/%lu pages) "
- "to vma 0x%08lx, page_prot 0x%llx\n", file,
- pgoff, address, pages, vma_pages(vma), vma->vm_start,
- (unsigned long long)pgprot_val(vma->vm_page_prot));
return ret;
bailout:
@@ -434,9 +428,9 @@ static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
#else /* CONFIG_MMU */
-static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
+static int cramfs_physmem_mmap_prepare(struct vm_area_desc *desc)
{
- return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -ENOSYS;
+ return is_nommu_shared_mapping(desc->vm_flags) ? 0 : -ENOSYS;
}
static unsigned long cramfs_physmem_get_unmapped_area(struct file *file,
@@ -474,7 +468,7 @@ static const struct file_operations cramfs_physmem_fops = {
.llseek = generic_file_llseek,
.read_iter = generic_file_read_iter,
.splice_read = filemap_splice_read,
- .mmap = cramfs_physmem_mmap,
+ .mmap_prepare = cramfs_physmem_mmap_prepare,
#ifndef CONFIG_MMU
.get_unmapped_area = cramfs_physmem_get_unmapped_area,
.mmap_capabilities = cramfs_physmem_mmap_capabilities,
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* [PATCH v2 14/16] fs/proc: add the proc_mmap_prepare hook for procfs
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (12 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 13/16] mm: update cramfs " Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 15/16] fs/proc: update vmcore to use .proc_mmap_prepare Lorenzo Stoakes
` (2 subsequent siblings)
16 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
By adding this hook we enable procfs implementations to be able to use the
.mmap_prepare hook rather than the deprecated .mmap one.
We treat this as if it were any other nested mmap hook and utilise the
.mmap_prepare compatibility layer if necessary.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
fs/proc/inode.c | 12 +++++++++---
include/linux/proc_fs.h | 1 +
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 129490151be1..609abbc84bf4 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -414,9 +414,15 @@ static long proc_reg_compat_ioctl(struct file *file, unsigned int cmd, unsigned
static int pde_mmap(struct proc_dir_entry *pde, struct file *file, struct vm_area_struct *vma)
{
- __auto_type mmap = pde->proc_ops->proc_mmap;
- if (mmap)
- return mmap(file, vma);
+ const struct file_operations f_op = {
+ .mmap = pde->proc_ops->proc_mmap,
+ .mmap_prepare = pde->proc_ops->proc_mmap_prepare,
+ };
+
+ if (f_op.mmap)
+ return f_op.mmap(file, vma);
+ else if (f_op.mmap_prepare)
+ return __compat_vma_mmap_prepare(&f_op, file, vma);
return -EIO;
}
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index f139377f4b31..e5f65ebd62b8 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -47,6 +47,7 @@ struct proc_ops {
long (*proc_compat_ioctl)(struct file *, unsigned int, unsigned long);
#endif
int (*proc_mmap)(struct file *, struct vm_area_struct *);
+ int (*proc_mmap_prepare)(struct vm_area_desc *);
unsigned long (*proc_get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
} __randomize_layout;
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* [PATCH v2 15/16] fs/proc: update vmcore to use .proc_mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (13 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 14/16] fs/proc: add the proc_mmap_prepare hook for procfs Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-12 10:14 ` Lorenzo Stoakes
2025-09-10 20:22 ` [PATCH v2 16/16] kcov: update kcov to use mmap_prepare Lorenzo Stoakes
2025-09-10 21:38 ` [PATCH v2 00/16] expand mmap_prepare functionality, port more users Andrew Morton
16 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Now we have the ability to specify a custom hook we can handle even very
customised behaviour.
As part of this change, we must also update remap_vmalloc_range_partial()
to optionally not update VMA flags. Other than then remap_vmalloc_range()
wrapper, vmcore is the only user of this function so we can simply go ahead
and add a parameter.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
arch/s390/kernel/crash_dump.c | 6 ++--
fs/proc/vmcore.c | 54 +++++++++++++++++++++++------------
include/linux/vmalloc.h | 10 +++----
mm/vmalloc.c | 16 +++++++++--
4 files changed, 57 insertions(+), 29 deletions(-)
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index d4839de8ce9d..44d7902f7e41 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -186,7 +186,7 @@ static int remap_oldmem_pfn_range_kdump(struct vm_area_struct *vma,
if (pfn < oldmem_data.size >> PAGE_SHIFT) {
size_old = min(size, oldmem_data.size - (pfn << PAGE_SHIFT));
- rc = remap_pfn_range(vma, from,
+ rc = remap_pfn_range_complete(vma, from,
pfn + (oldmem_data.start >> PAGE_SHIFT),
size_old, prot);
if (rc || size == size_old)
@@ -195,7 +195,7 @@ static int remap_oldmem_pfn_range_kdump(struct vm_area_struct *vma,
from += size_old;
pfn += size_old >> PAGE_SHIFT;
}
- return remap_pfn_range(vma, from, pfn, size, prot);
+ return remap_pfn_range_complete(vma, from, pfn, size, prot);
}
/*
@@ -220,7 +220,7 @@ static int remap_oldmem_pfn_range_zfcpdump(struct vm_area_struct *vma,
from += size_hsa;
pfn += size_hsa >> PAGE_SHIFT;
}
- return remap_pfn_range(vma, from, pfn, size, prot);
+ return remap_pfn_range_complete(vma, from, pfn, size, prot);
}
/*
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index f188bd900eb2..faf811ed9b15 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -254,7 +254,7 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
unsigned long size, pgprot_t prot)
{
prot = pgprot_encrypted(prot);
- return remap_pfn_range(vma, from, pfn, size, prot);
+ return remap_pfn_range_complete(vma, from, pfn, size, prot);
}
/*
@@ -308,7 +308,7 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
tsz = min(offset + (u64)dump->size - start, (u64)size);
buf = dump->buf + start - offset;
if (remap_vmalloc_range_partial(vma, dst, buf, 0,
- tsz))
+ tsz, /* set_vma= */false))
return -EFAULT;
size -= tsz;
@@ -588,24 +588,15 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
return ret;
}
-static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+static int mmap_prepare_action_vmcore(struct vm_area_struct *vma)
{
+ struct mmap_action action;
size_t size = vma->vm_end - vma->vm_start;
u64 start, end, len, tsz;
struct vmcore_range *m;
start = (u64)vma->vm_pgoff << PAGE_SHIFT;
end = start + size;
-
- if (size > vmcore_size || end > vmcore_size)
- return -EINVAL;
-
- if (vma->vm_flags & (VM_WRITE | VM_EXEC))
- return -EPERM;
-
- vm_flags_mod(vma, VM_MIXEDMAP, VM_MAYWRITE | VM_MAYEXEC);
- vma->vm_ops = &vmcore_mmap_ops;
-
len = 0;
if (start < elfcorebuf_sz) {
@@ -613,8 +604,10 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
tsz = min(elfcorebuf_sz - (size_t)start, size);
pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
- if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
- vma->vm_page_prot))
+
+ mmap_action_remap(&action, vma->vm_start, pfn, tsz,
+ vma->vm_page_prot);
+ if (mmap_action_complete(&action, vma))
return -EAGAIN;
size -= tsz;
start += tsz;
@@ -664,7 +657,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start, size);
kaddr = elfnotes_buf + start - elfcorebuf_sz - vmcoredd_orig_sz;
if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
- kaddr, 0, tsz))
+ kaddr, 0, tsz, /* set_vma =*/false))
goto fail;
size -= tsz;
@@ -700,8 +693,33 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
return -EAGAIN;
}
+
+static int mmap_prepare_vmcore(struct vm_area_desc *desc)
+{
+ size_t size = vma_desc_size(desc);
+ u64 start, end;
+
+ start = (u64)desc->pgoff << PAGE_SHIFT;
+ end = start + size;
+
+ if (size > vmcore_size || end > vmcore_size)
+ return -EINVAL;
+
+ if (desc->vm_flags & (VM_WRITE | VM_EXEC))
+ return -EPERM;
+
+ /* This is a unique case where we set both PFN map and mixed map flags. */
+ desc->vm_flags |= VM_MIXEDMAP | VM_REMAP_FLAGS;
+ desc->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+ desc->vm_ops = &vmcore_mmap_ops;
+
+ desc->action.type = MMAP_CUSTOM_ACTION;
+ desc->action.custom.action_hook = mmap_prepare_action_vmcore;
+
+ return 0;
+}
#else
-static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+static int mmap_prepare_vmcore(struct vm_area_desc *desc)
{
return -ENOSYS;
}
@@ -712,7 +730,7 @@ static const struct proc_ops vmcore_proc_ops = {
.proc_release = release_vmcore,
.proc_read_iter = read_vmcore,
.proc_lseek = default_llseek,
- .proc_mmap = mmap_vmcore,
+ .proc_mmap_prepare = mmap_prepare_vmcore,
};
static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index eb54b7b3202f..588810e571aa 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -215,12 +215,12 @@ extern void *vmap(struct page **pages, unsigned int count,
void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot);
extern void vunmap(const void *addr);
-extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
- unsigned long uaddr, void *kaddr,
- unsigned long pgoff, unsigned long size);
+int remap_vmalloc_range_partial(struct vm_area_struct *vma,
+ unsigned long uaddr, void *kaddr, unsigned long pgoff,
+ unsigned long size, bool set_vma);
-extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
- unsigned long pgoff);
+int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
+ unsigned long pgoff);
int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot,
struct page **pages, unsigned int page_shift);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9fc86ddf1711..3dd9d5c441d8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4531,6 +4531,7 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
* @kaddr: virtual address of vmalloc kernel memory
* @pgoff: offset from @kaddr to start at
* @size: size of map area
+ * @set_vma: If true, update VMA flags
*
* Returns: 0 for success, -Exxx on failure
*
@@ -4543,7 +4544,7 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
*/
int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
void *kaddr, unsigned long pgoff,
- unsigned long size)
+ unsigned long size, bool set_vma)
{
struct vm_struct *area;
unsigned long off;
@@ -4569,6 +4570,10 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
return -EINVAL;
kaddr += off;
+ /* If we shouldn't modify VMA flags, vm_insert_page() mustn't. */
+ if (!set_vma && !(vma->vm_flags & VM_MIXEDMAP))
+ return -EINVAL;
+
do {
struct page *page = vmalloc_to_page(kaddr);
int ret;
@@ -4582,7 +4587,11 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
size -= PAGE_SIZE;
} while (size > 0);
- vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP);
+ if (set_vma)
+ vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP);
+ else
+ VM_WARN_ON_ONCE((vma->vm_flags & (VM_DONTEXPAND | VM_DONTDUMP)) !=
+ (VM_DONTEXPAND | VM_DONTDUMP));
return 0;
}
@@ -4606,7 +4615,8 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
{
return remap_vmalloc_range_partial(vma, vma->vm_start,
addr, pgoff,
- vma->vm_end - vma->vm_start);
+ vma->vm_end - vma->vm_start,
+ /* set_vma= */ true);
}
EXPORT_SYMBOL(remap_vmalloc_range);
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 15/16] fs/proc: update vmcore to use .proc_mmap_prepare
2025-09-10 20:22 ` [PATCH v2 15/16] fs/proc: update vmcore to use .proc_mmap_prepare Lorenzo Stoakes
@ 2025-09-12 10:14 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-12 10:14 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
Hi Andrew,
Can you apply the below fix-patch to address a trivial variable use warning,
thanks!
Cheers, Lorenzo
----8<----
From b9d0c3b39d97309bf572af443e2190bb20f6b976 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 12 Sep 2025 11:12:10 +0100
Subject: [PATCH] vmcore fix
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
fs/proc/vmcore.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index faf811ed9b15..028c8c904cbb 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -592,11 +592,10 @@ static int mmap_prepare_action_vmcore(struct vm_area_struct *vma)
{
struct mmap_action action;
size_t size = vma->vm_end - vma->vm_start;
- u64 start, end, len, tsz;
+ u64 start, len, tsz;
struct vmcore_range *m;
start = (u64)vma->vm_pgoff << PAGE_SHIFT;
- end = start + size;
len = 0;
if (start < elfcorebuf_sz) {
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread
* [PATCH v2 16/16] kcov: update kcov to use mmap_prepare
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (14 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 15/16] fs/proc: update vmcore to use .proc_mmap_prepare Lorenzo Stoakes
@ 2025-09-10 20:22 ` Lorenzo Stoakes
2025-09-15 12:16 ` Jason Gunthorpe
2025-09-18 19:45 ` Chris Mason
2025-09-10 21:38 ` [PATCH v2 00/16] expand mmap_prepare functionality, port more users Andrew Morton
16 siblings, 2 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-10 20:22 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
We can use the mmap insert pages functionality provided for use in
mmap_prepare to insert the kcov pages as required.
This does necessitate an allocation, but since it's in the mmap path this
doesn't seem egregious. The allocation/freeing of the pages array is
handled automatically by vma_desc_set_mixedmap_pages() and the mapping
logic.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
kernel/kcov.c | 42 ++++++++++++++++++++++++++----------------
1 file changed, 26 insertions(+), 16 deletions(-)
diff --git a/kernel/kcov.c b/kernel/kcov.c
index 1d85597057e1..2bcf403e5f6f 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -484,31 +484,41 @@ void kcov_task_exit(struct task_struct *t)
kcov_put(kcov);
}
-static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
+static int kcov_mmap_error(int err)
+{
+ pr_warn_once("kcov: vm_insert_page() failed\n");
+ return err;
+}
+
+static int kcov_mmap_prepare(struct vm_area_desc *desc)
{
int res = 0;
- struct kcov *kcov = vma->vm_file->private_data;
- unsigned long size, off;
- struct page *page;
+ struct kcov *kcov = desc->file->private_data;
+ unsigned long size, nr_pages, i;
+ struct page **pages;
unsigned long flags;
spin_lock_irqsave(&kcov->lock, flags);
size = kcov->size * sizeof(unsigned long);
- if (kcov->area == NULL || vma->vm_pgoff != 0 ||
- vma->vm_end - vma->vm_start != size) {
+ if (kcov->area == NULL || desc->pgoff != 0 ||
+ vma_desc_size(desc) != size) {
res = -EINVAL;
goto exit;
}
spin_unlock_irqrestore(&kcov->lock, flags);
- vm_flags_set(vma, VM_DONTEXPAND);
- for (off = 0; off < size; off += PAGE_SIZE) {
- page = vmalloc_to_page(kcov->area + off);
- res = vm_insert_page(vma, vma->vm_start + off, page);
- if (res) {
- pr_warn_once("kcov: vm_insert_page() failed\n");
- return res;
- }
- }
+
+ desc->vm_flags |= VM_DONTEXPAND;
+ nr_pages = size >> PAGE_SHIFT;
+
+ pages = mmap_action_mixedmap_pages(&desc->action, desc->start,
+ nr_pages);
+ if (!pages)
+ return -ENOMEM;
+
+ for (i = 0; i < nr_pages; i++)
+ pages[i] = vmalloc_to_page(kcov->area + i * PAGE_SIZE);
+ desc->action.error_hook = kcov_mmap_error;
+
return 0;
exit:
spin_unlock_irqrestore(&kcov->lock, flags);
@@ -761,7 +771,7 @@ static const struct file_operations kcov_fops = {
.open = kcov_open,
.unlocked_ioctl = kcov_ioctl,
.compat_ioctl = kcov_ioctl,
- .mmap = kcov_mmap,
+ .mmap_prepare = kcov_mmap_prepare,
.release = kcov_close,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 55+ messages in thread* Re: [PATCH v2 16/16] kcov: update kcov to use mmap_prepare
2025-09-10 20:22 ` [PATCH v2 16/16] kcov: update kcov to use mmap_prepare Lorenzo Stoakes
@ 2025-09-15 12:16 ` Jason Gunthorpe
2025-09-15 12:43 ` Lorenzo Stoakes
2025-09-18 19:45 ` Chris Mason
1 sibling, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-09-15 12:16 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Wed, Sep 10, 2025 at 09:22:11PM +0100, Lorenzo Stoakes wrote:
> +static int kcov_mmap_prepare(struct vm_area_desc *desc)
> {
> int res = 0;
> - struct kcov *kcov = vma->vm_file->private_data;
> - unsigned long size, off;
> - struct page *page;
> + struct kcov *kcov = desc->file->private_data;
> + unsigned long size, nr_pages, i;
> + struct page **pages;
> unsigned long flags;
>
> spin_lock_irqsave(&kcov->lock, flags);
> size = kcov->size * sizeof(unsigned long);
> - if (kcov->area == NULL || vma->vm_pgoff != 0 ||
> - vma->vm_end - vma->vm_start != size) {
> + if (kcov->area == NULL || desc->pgoff != 0 ||
> + vma_desc_size(desc) != size) {
IMHO these range checks should be cleaned up into a helper:
/* Returns true if the VMA falls within starting_pgoff to
starting_pgoff + ROUND_DOWN(length_bytes, PAGE_SIZE))
Is careful to avoid any arithmetic overflow.
*/
vma_desc_check_range(desc, starting_pgoff=0, length_bytes=size);
> + desc->vm_flags |= VM_DONTEXPAND;
> + nr_pages = size >> PAGE_SHIFT;
> +
> + pages = mmap_action_mixedmap_pages(&desc->action, desc->start,
> + nr_pages);
> + if (!pages)
> + return -ENOMEM;
> +
> + for (i = 0; i < nr_pages; i++)
> + pages[i] = vmalloc_to_page(kcov->area + i * PAGE_SIZE);
This is not a mixed map.
All the memory comes from vmalloc_user() which makes them normal
struct pages with refcounts.
If anything the action should be called mmap_action_vmalloc_user() to
match how the memory was allocated instead of open coding something.
Jason
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 16/16] kcov: update kcov to use mmap_prepare
2025-09-15 12:16 ` Jason Gunthorpe
@ 2025-09-15 12:43 ` Lorenzo Stoakes
2025-09-15 12:48 ` Jason Gunthorpe
0 siblings, 1 reply; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 12:43 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 09:16:17AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 10, 2025 at 09:22:11PM +0100, Lorenzo Stoakes wrote:
> > +static int kcov_mmap_prepare(struct vm_area_desc *desc)
> > {
> > int res = 0;
> > - struct kcov *kcov = vma->vm_file->private_data;
> > - unsigned long size, off;
> > - struct page *page;
> > + struct kcov *kcov = desc->file->private_data;
> > + unsigned long size, nr_pages, i;
> > + struct page **pages;
> > unsigned long flags;
> >
> > spin_lock_irqsave(&kcov->lock, flags);
> > size = kcov->size * sizeof(unsigned long);
> > - if (kcov->area == NULL || vma->vm_pgoff != 0 ||
> > - vma->vm_end - vma->vm_start != size) {
> > + if (kcov->area == NULL || desc->pgoff != 0 ||
> > + vma_desc_size(desc) != size) {
>
> IMHO these range checks should be cleaned up into a helper:
>
> /* Returns true if the VMA falls within starting_pgoff to
> starting_pgoff + ROUND_DOWN(length_bytes, PAGE_SIZE))
> Is careful to avoid any arithmetic overflow.
> */
Right, but I can't refactor every driver I touch, it's not really tractable. I'd
like to get this change done before I retire :)
> vma_desc_check_range(desc, starting_pgoff=0, length_bytes=size);
>
> > + desc->vm_flags |= VM_DONTEXPAND;
> > + nr_pages = size >> PAGE_SHIFT;
> > +
> > + pages = mmap_action_mixedmap_pages(&desc->action, desc->start,
> > + nr_pages);
> > + if (!pages)
> > + return -ENOMEM;
> > +
> > + for (i = 0; i < nr_pages; i++)
> > + pages[i] = vmalloc_to_page(kcov->area + i * PAGE_SIZE);
>
> This is not a mixed map.
>
> All the memory comes from vmalloc_user() which makes them normal
> struct pages with refcounts.
>
> If anything the action should be called mmap_action_vmalloc_user() to
> match how the memory was allocated instead of open coding something.
Again we're getting into the same issue - my workload doesn't really permit
me to refactor every user of .mmap beyond converting sensibly to the new
scheme.
I think this kind of change is out of scope for the series.
I'd rather make this as apples-to-apples as possible for now so it can be
done vaguely mechanically.
Of course we can follow up with improvements later.
>
> Jason
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 16/16] kcov: update kcov to use mmap_prepare
2025-09-15 12:43 ` Lorenzo Stoakes
@ 2025-09-15 12:48 ` Jason Gunthorpe
2025-09-15 13:01 ` Lorenzo Stoakes
0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-09-15 12:48 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 01:43:50PM +0100, Lorenzo Stoakes wrote:
> > > + if (kcov->area == NULL || desc->pgoff != 0 ||
> > > + vma_desc_size(desc) != size) {
> >
> > IMHO these range checks should be cleaned up into a helper:
> >
> > /* Returns true if the VMA falls within starting_pgoff to
> > starting_pgoff + ROUND_DOWN(length_bytes, PAGE_SIZE))
> > Is careful to avoid any arithmetic overflow.
> > */
>
> Right, but I can't refactor every driver I touch, it's not really tractable. I'd
> like to get this change done before I retire :)
I don't think it is a big deal, and these helpers should be part of
the new api. You are reading and touching anyhow.
> > If anything the action should be called mmap_action_vmalloc_user() to
> > match how the memory was allocated instead of open coding something.
>
> Again we're getting into the same issue - my workload doesn't really permit
> me to refactor every user of .mmap beyond converting sensibly to the new
> scheme.
If you are adding this explicit action concept then it should be a
sane set of actions. Using a mixed map action to insert a vmalloc_user
is not a reasonable thing to do.
Jason
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 16/16] kcov: update kcov to use mmap_prepare
2025-09-15 12:48 ` Jason Gunthorpe
@ 2025-09-15 13:01 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-15 13:01 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Andrew Morton, Jonathan Corbet, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev
On Mon, Sep 15, 2025 at 09:48:01AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 15, 2025 at 01:43:50PM +0100, Lorenzo Stoakes wrote:
> > > > + if (kcov->area == NULL || desc->pgoff != 0 ||
> > > > + vma_desc_size(desc) != size) {
> > >
> > > IMHO these range checks should be cleaned up into a helper:
> > >
> > > /* Returns true if the VMA falls within starting_pgoff to
> > > starting_pgoff + ROUND_DOWN(length_bytes, PAGE_SIZE))
> > > Is careful to avoid any arithmetic overflow.
> > > */
> >
> > Right, but I can't refactor every driver I touch, it's not really tractable. I'd
> > like to get this change done before I retire :)
>
> I don't think it is a big deal, and these helpers should be part of
> the new api. You are reading and touching anyhow.
x ~230 becomes a big deal.
>
> > > If anything the action should be called mmap_action_vmalloc_user() to
> > > match how the memory was allocated instead of open coding something.
> >
> > Again we're getting into the same issue - my workload doesn't really permit
> > me to refactor every user of .mmap beyond converting sensibly to the new
> > scheme.
>
> If you are adding this explicit action concept then it should be a
> sane set of actions. Using a mixed map action to insert a vmalloc_user
> is not a reasonable thing to do.
Right I'm obviously intending there to be a sane interface.
And there are users who use mixed map to insert actual mixed map pages, so
having an interface for _that_ isn't crazy. So it's not like this is
compromising that.
(I mean an aside is we need to clean up a lot there anyway, it's a mess, but
that's out of scope here.)
>
> Jason
Anwyay, for the sake of getting this series in since you seem adament, I'll
go ahead and refactor in this case. But it's really not reasonable to
expect me to do this in each instance.
I will obviously try my best to ensure the API is as good as it can be, and
adapted to what mmap users need. That bit I am trying to get as right as I
can...
But in each individual driver's case, we have to be pragmatic.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 16/16] kcov: update kcov to use mmap_prepare
2025-09-10 20:22 ` [PATCH v2 16/16] kcov: update kcov to use mmap_prepare Lorenzo Stoakes
2025-09-15 12:16 ` Jason Gunthorpe
@ 2025-09-18 19:45 ` Chris Mason
2025-09-19 5:10 ` Lorenzo Stoakes
1 sibling, 1 reply; 55+ messages in thread
From: Chris Mason @ 2025-09-18 19:45 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Chris Mason, Andrew Morton, Matthew Wilcox, Guo Ren,
Thomas Bogendoerfer, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
David S . Miller, Andreas Larsson, Arnd Bergmann,
Greg Kroah-Hartman, Dan Williams, Vishal Verma, Dave Jiang,
Nicolas Pitre, Muchun Song, Oscar Salvador, David Hildenbrand,
Konstantin Komarov, Baoquan He, Vivek Goyal, Dave Young,
Tony Luck, Reinette Chatre, Dave Martin, James Morse,
Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Hugh Dickins, Baolin Wang, Uladzislau Rezki, Dmitry Vyukov,
Andrey Konovalov, Jann Horn, Pedro Falcato, linux-doc,
linux-kernel, linux-fsdevel, linux-csky, linux-mips, linux-s390,
sparclinux, nvdimm, linux-cxl, linux-mm, ntfs3, kexec, kasan-dev,
Jason Gunthorpe
On Wed, 10 Sep 2025 21:22:11 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> We can use the mmap insert pages functionality provided for use in
> mmap_prepare to insert the kcov pages as required.
>
> This does necessitate an allocation, but since it's in the mmap path this
> doesn't seem egregious. The allocation/freeing of the pages array is
> handled automatically by vma_desc_set_mixedmap_pages() and the mapping
> logic.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> kernel/kcov.c | 42 ++++++++++++++++++++++++++----------------
> 1 file changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/kcov.c b/kernel/kcov.c
> index 1d85597057e1..2bcf403e5f6f 100644
> --- a/kernel/kcov.c
> +++ b/kernel/kcov.c
> @@ -484,31 +484,41 @@ void kcov_task_exit(struct task_struct *t)
> kcov_put(kcov);
> }
>
> -static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
> +static int kcov_mmap_error(int err)
> +{
> + pr_warn_once("kcov: vm_insert_page() failed\n");
> + return err;
> +}
> +
> +static int kcov_mmap_prepare(struct vm_area_desc *desc)
> {
> int res = 0;
> - struct kcov *kcov = vma->vm_file->private_data;
> - unsigned long size, off;
> - struct page *page;
> + struct kcov *kcov = desc->file->private_data;
> + unsigned long size, nr_pages, i;
> + struct page **pages;
> unsigned long flags;
>
> spin_lock_irqsave(&kcov->lock, flags);
> size = kcov->size * sizeof(unsigned long);
> - if (kcov->area == NULL || vma->vm_pgoff != 0 ||
> - vma->vm_end - vma->vm_start != size) {
> + if (kcov->area == NULL || desc->pgoff != 0 ||
> + vma_desc_size(desc) != size) {
> res = -EINVAL;
> goto exit;
> }
> spin_unlock_irqrestore(&kcov->lock, flags);
> - vm_flags_set(vma, VM_DONTEXPAND);
> - for (off = 0; off < size; off += PAGE_SIZE) {
> - page = vmalloc_to_page(kcov->area + off);
> - res = vm_insert_page(vma, vma->vm_start + off, page);
> - if (res) {
> - pr_warn_once("kcov: vm_insert_page() failed\n");
> - return res;
> - }
> - }
> +
> + desc->vm_flags |= VM_DONTEXPAND;
> + nr_pages = size >> PAGE_SHIFT;
> +
> + pages = mmap_action_mixedmap_pages(&desc->action, desc->start,
> + nr_pages);
Hi Lorenzo,
Not sure if it belongs here before the EINVAL tests, but it looks like
kcov->size doesn't have any page alignment. I think size could be
4000 bytes other unaligned values, so nr_pages should round up.
-chris
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 16/16] kcov: update kcov to use mmap_prepare
2025-09-18 19:45 ` Chris Mason
@ 2025-09-19 5:10 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-19 5:10 UTC (permalink / raw)
To: Chris Mason
Cc: Andrew Morton, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
On Thu, Sep 18, 2025 at 12:45:38PM -0700, Chris Mason wrote:
> On Wed, 10 Sep 2025 21:22:11 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > We can use the mmap insert pages functionality provided for use in
> > mmap_prepare to insert the kcov pages as required.
> >
> > This does necessitate an allocation, but since it's in the mmap path this
> > doesn't seem egregious. The allocation/freeing of the pages array is
> > handled automatically by vma_desc_set_mixedmap_pages() and the mapping
> > logic.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > kernel/kcov.c | 42 ++++++++++++++++++++++++++----------------
> > 1 file changed, 26 insertions(+), 16 deletions(-)
> >
> > diff --git a/kernel/kcov.c b/kernel/kcov.c
> > index 1d85597057e1..2bcf403e5f6f 100644
> > --- a/kernel/kcov.c
> > +++ b/kernel/kcov.c
> > @@ -484,31 +484,41 @@ void kcov_task_exit(struct task_struct *t)
> > kcov_put(kcov);
> > }
> >
> > -static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
> > +static int kcov_mmap_error(int err)
> > +{
> > + pr_warn_once("kcov: vm_insert_page() failed\n");
> > + return err;
> > +}
> > +
> > +static int kcov_mmap_prepare(struct vm_area_desc *desc)
> > {
> > int res = 0;
> > - struct kcov *kcov = vma->vm_file->private_data;
> > - unsigned long size, off;
> > - struct page *page;
> > + struct kcov *kcov = desc->file->private_data;
> > + unsigned long size, nr_pages, i;
> > + struct page **pages;
> > unsigned long flags;
> >
> > spin_lock_irqsave(&kcov->lock, flags);
> > size = kcov->size * sizeof(unsigned long);
> > - if (kcov->area == NULL || vma->vm_pgoff != 0 ||
> > - vma->vm_end - vma->vm_start != size) {
> > + if (kcov->area == NULL || desc->pgoff != 0 ||
> > + vma_desc_size(desc) != size) {
> > res = -EINVAL;
> > goto exit;
> > }
> > spin_unlock_irqrestore(&kcov->lock, flags);
> > - vm_flags_set(vma, VM_DONTEXPAND);
> > - for (off = 0; off < size; off += PAGE_SIZE) {
> > - page = vmalloc_to_page(kcov->area + off);
> > - res = vm_insert_page(vma, vma->vm_start + off, page);
> > - if (res) {
> > - pr_warn_once("kcov: vm_insert_page() failed\n");
> > - return res;
> > - }
> > - }
> > +
> > + desc->vm_flags |= VM_DONTEXPAND;
> > + nr_pages = size >> PAGE_SHIFT;
> > +
> > + pages = mmap_action_mixedmap_pages(&desc->action, desc->start,
> > + nr_pages);
>
> Hi Lorenzo,
>
> Not sure if it belongs here before the EINVAL tests, but it looks like
> kcov->size doesn't have any page alignment. I think size could be
> 4000 bytes other unaligned values, so nr_pages should round up.
Thanks, you may well be right, but but this series has been respun and I no
longer touch kcov. :)
Am at v4 now -
https://lore.kernel.org/linux-mm/cover.1758135681.git.lorenzo.stoakes@oracle.com/
- apologies for the quick turnaround but going to kernel recipes soon and then
on vacation so wanted to get this wrapped up!
>
> -chris
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH v2 00/16] expand mmap_prepare functionality, port more users
2025-09-10 20:21 [PATCH v2 00/16] expand mmap_prepare functionality, port more users Lorenzo Stoakes
` (15 preceding siblings ...)
2025-09-10 20:22 ` [PATCH v2 16/16] kcov: update kcov to use mmap_prepare Lorenzo Stoakes
@ 2025-09-10 21:38 ` Andrew Morton
2025-09-11 5:19 ` Lorenzo Stoakes
16 siblings, 1 reply; 55+ messages in thread
From: Andrew Morton @ 2025-09-10 21:38 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
On Wed, 10 Sep 2025 21:21:55 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
> callback"), The f_op->mmap hook has been deprecated in favour of
> f_op->mmap_prepare.
>
> This was introduced in order to make it possible for us to eventually
> eliminate the f_op->mmap hook which is highly problematic as it allows
> drivers and filesystems raw access to a VMA which is not yet correctly
> initialised.
>
> This hook also introduced complexity for the memory mapping operation, as
> we must correctly unwind what we do should an error arises.
>
> Overall this interface being so open has caused significant problems for
> us, including security issues, it is important for us to simply eliminate
> this as a source of problems.
>
> Therefore this series continues what was established by extending the
> functionality further to permit more drivers and filesystems to use
> mmap_prepare.
Cool, I'll add this to mm-new but I'll suppress the usual emails.
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: [PATCH v2 00/16] expand mmap_prepare functionality, port more users
2025-09-10 21:38 ` [PATCH v2 00/16] expand mmap_prepare functionality, port more users Andrew Morton
@ 2025-09-11 5:19 ` Lorenzo Stoakes
0 siblings, 0 replies; 55+ messages in thread
From: Lorenzo Stoakes @ 2025-09-11 5:19 UTC (permalink / raw)
To: Andrew Morton
Cc: Jonathan Corbet, Matthew Wilcox, Guo Ren, Thomas Bogendoerfer,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, David S . Miller,
Andreas Larsson, Arnd Bergmann, Greg Kroah-Hartman, Dan Williams,
Vishal Verma, Dave Jiang, Nicolas Pitre, Muchun Song,
Oscar Salvador, David Hildenbrand, Konstantin Komarov, Baoquan He,
Vivek Goyal, Dave Young, Tony Luck, Reinette Chatre, Dave Martin,
James Morse, Alexander Viro, Christian Brauner, Jan Kara,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Hugh Dickins, Baolin Wang,
Uladzislau Rezki, Dmitry Vyukov, Andrey Konovalov, Jann Horn,
Pedro Falcato, linux-doc, linux-kernel, linux-fsdevel, linux-csky,
linux-mips, linux-s390, sparclinux, nvdimm, linux-cxl, linux-mm,
ntfs3, kexec, kasan-dev, Jason Gunthorpe
On Wed, Sep 10, 2025 at 02:38:45PM -0700, Andrew Morton wrote:
> On Wed, 10 Sep 2025 21:21:55 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
> > callback"), The f_op->mmap hook has been deprecated in favour of
> > f_op->mmap_prepare.
> >
> > This was introduced in order to make it possible for us to eventually
> > eliminate the f_op->mmap hook which is highly problematic as it allows
> > drivers and filesystems raw access to a VMA which is not yet correctly
> > initialised.
> >
> > This hook also introduced complexity for the memory mapping operation, as
> > we must correctly unwind what we do should an error arises.
> >
> > Overall this interface being so open has caused significant problems for
> > us, including security issues, it is important for us to simply eliminate
> > this as a source of problems.
> >
> > Therefore this series continues what was established by extending the
> > functionality further to permit more drivers and filesystems to use
> > mmap_prepare.
>
> Cool, I'll add this to mm-new but I'll suppress the usual emails.
Thanks!
^ permalink raw reply [flat|nested] 55+ messages in thread