* Re: [PATCH v2 2/4] mm: Add file_operations.get_mapping_order()
[not found] ` <20251204151003.171039-3-peterx@redhat.com>
@ 2025-12-04 15:19 ` Peter Xu
2025-12-08 9:21 ` Matthew Wilcox
0 siblings, 1 reply; 3+ messages in thread
From: Peter Xu @ 2025-12-04 15:19 UTC (permalink / raw)
To: kvm, linux-mm, linux-kernel
Cc: Jason Gunthorpe, Nico Pache, Zi Yan, Alex Mastro,
David Hildenbrand, Alex Williamson, Zhi Wang, David Laight,
Yi Liu, Ankit Agrawal, Kevin Tian, Andrew Morton,
David Hildenbrand, Matthew Wilcox, Jonathan Corbet, linux-fsdevel,
linux-doc, Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka,
Jann Horn, Pedro Falcato, Alexander Viro, Christian Brauner,
Jan Kara, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
I forgot to copy mm/fs maintainers for the 1st/2nd patches in this series,
my apologies. Whole series can be found here:
https://lore.kernel.org/r/20251204151003.171039-1-peterx@redhat.com
I'll modify the cc list when repost.
Thanks,
On Thu, Dec 04, 2025 at 10:10:01AM -0500, Peter Xu wrote:
> Add one new file operation, get_mapping_order(). It can be used by file
> backends to report mapping order hints.
>
> By default, Linux assumed we will map in PAGE_SIZE chunks. With this hint,
> the driver can report the possibility of mapping chunks that are larger
> than PAGE_SIZE. Then, the VA allocator will try to use that as alignment
> when allocating the VA ranges.
>
> This is useful because when chunks to be mapped are larger than PAGE_SIZE,
> VA alignment matters and it needs to be aligned with the size of the chunk
> to be mapped.
>
> Said that, no matter what is the alignment used for the VA allocation, the
> driver can still decide which size to map the chunks. It is also not an
> issue if it keeps mapping in PAGE_SIZE.
>
> get_mapping_order() is defined to take three parameters. Besides the 1st
> parameter which will be the file object pointer, the 2nd + 3rd parameters
> being the pgoff + size of the mmap() request. Its retval is defined as the
> order, which must be non-negative to enable the alignment. When zero is
> returned, it should behave like when the hint is not provided, IOW,
> alignment will still be PAGE_SIZE.
>
> When the order is too big, ignore the hint. Normally drivers are trusted,
> so it's more of an extra layer of safety measure.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> Documentation/filesystems/vfs.rst | 4 +++
> include/linux/fs.h | 1 +
> mm/mmap.c | 59 +++++++++++++++++++++++++++----
> 3 files changed, 57 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 4f13b01e42eb5..b707ddbebbf52 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -1069,6 +1069,7 @@ This describes how the VFS can manipulate an open file. As of kernel
> int (*fasync) (int, struct file *, int);
> int (*lock) (struct file *, int, struct file_lock *);
> unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
> + int (*get_mapping_order)(struct file *, unsigned long, size_t);
> int (*check_flags)(int);
> int (*flock) (struct file *, int, struct file_lock *);
> ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
> @@ -1165,6 +1166,9 @@ otherwise noted.
> ``get_unmapped_area``
> called by the mmap(2) system call
>
> +``get_mapping_order``
> + called by the mmap(2) system call to get mapping order hint
> +
> ``check_flags``
> called by the fcntl(2) system call for F_SETFL command
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index dd3b57cfadeeb..5ba373576bfe5 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2287,6 +2287,7 @@ struct file_operations {
> int (*fasync) (int, struct file *, int);
> int (*lock) (struct file *, int, struct file_lock *);
> unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
> + int (*get_mapping_order)(struct file *file, unsigned long pgoff, size_t len);
> int (*check_flags)(int);
> int (*flock) (struct file *, int, struct file_lock *);
> ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 8fa397a18252e..be3dd0623f00c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -808,6 +808,33 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *fi
> return arch_get_unmapped_area(filp, addr, len, pgoff, flags, vm_flags);
> }
>
> +static inline bool file_has_mmap_order_hint(struct file *file)
> +{
> + return file && file->f_op && file->f_op->get_mapping_order;
> +}
> +
> +static inline bool
> +mmap_should_align(struct file *file, unsigned long addr, unsigned long len)
> +{
> + /* When THP not enabled at all, skip */
> + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> + return false;
> +
> + /* Never try any alignment if the mmap() address hint is provided */
> + if (addr)
> + return false;
> +
> + /* Anonymous THP could use some better alignment when len aligned */
> + if (!file)
> + return IS_ALIGNED(len, PMD_SIZE);
> +
> + /*
> + * It's a file mapping, no address hint provided by caller, try any
> + * alignment if the file backend would provide a hint
> + */
> + return file_has_mmap_order_hint(file);
> +}
> +
> unsigned long
> __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
> @@ -815,8 +842,9 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> unsigned long (*get_area)(struct file *, unsigned long,
> unsigned long, unsigned long, unsigned long)
> = NULL;
> -
> unsigned long error = arch_mmap_check(addr, len, flags);
> + unsigned long align;
> +
> if (error)
> return error;
>
> @@ -841,13 +869,30 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>
> if (get_area) {
> addr = get_area(file, addr, len, pgoff, flags);
> - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && !file
> - && !addr /* no hint */
> - && IS_ALIGNED(len, PMD_SIZE)) {
> - /* Ensures that larger anonymous mappings are THP aligned. */
> + } else if (mmap_should_align(file, addr, len)) {
> + if (file_has_mmap_order_hint(file)) {
> + int order;
> + /*
> + * Allow driver to opt-in on the order hint.
> + *
> + * Sanity check on the order returned. Treating
> + * either negative or too big order to be invalid,
> + * where alignment will be skipped.
> + */
> + order = file->f_op->get_mapping_order(file, pgoff, len);
> + if (order < 0)
> + order = 0;
> + if (check_shl_overflow(PAGE_SIZE, order, &align))
> + /* No alignment applied */
> + align = PAGE_SIZE;
> + } else {
> + /* Default alignment for anonymous THPs */
> + align = PMD_SIZE;
> + }
> +
> addr = thp_get_unmapped_area_vmflags(file, addr, len,
> - pgoff, flags, PMD_SIZE,
> - vm_flags);
> + pgoff, flags,
> + align, vm_flags);
> } else {
> addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
> pgoff, flags, vm_flags);
> --
> 2.50.1
>
--
Peter Xu
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 2/4] mm: Add file_operations.get_mapping_order()
2025-12-04 15:19 ` [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Peter Xu
@ 2025-12-08 9:21 ` Matthew Wilcox
2025-12-10 20:24 ` Peter Xu
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2025-12-08 9:21 UTC (permalink / raw)
To: Peter Xu
Cc: kvm, linux-mm, linux-kernel, Jason Gunthorpe, Nico Pache, Zi Yan,
Alex Mastro, David Hildenbrand, Alex Williamson, Zhi Wang,
David Laight, Yi Liu, Ankit Agrawal, Kevin Tian, Andrew Morton,
David Hildenbrand, Jonathan Corbet, linux-fsdevel, linux-doc,
Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka, Jann Horn,
Pedro Falcato, Alexander Viro, Christian Brauner, Jan Kara,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko
On Thu, Dec 04, 2025 at 10:19:44AM -0500, Peter Xu wrote:
> > Add one new file operation, get_mapping_order(). It can be used by file
> > backends to report mapping order hints.
This seems like a terrible idea. I'll look at it after Plumbers.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 2/4] mm: Add file_operations.get_mapping_order()
2025-12-08 9:21 ` Matthew Wilcox
@ 2025-12-10 20:24 ` Peter Xu
0 siblings, 0 replies; 3+ messages in thread
From: Peter Xu @ 2025-12-10 20:24 UTC (permalink / raw)
To: Matthew Wilcox
Cc: kvm, linux-mm, linux-kernel, Jason Gunthorpe, Nico Pache, Zi Yan,
Alex Mastro, David Hildenbrand, Alex Williamson, Zhi Wang,
David Laight, Yi Liu, Ankit Agrawal, Kevin Tian, Andrew Morton,
David Hildenbrand, Jonathan Corbet, linux-fsdevel, linux-doc,
Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka, Jann Horn,
Pedro Falcato, Alexander Viro, Christian Brauner, Jan Kara,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko
On Mon, Dec 08, 2025 at 09:21:58AM +0000, Matthew Wilcox wrote:
> On Thu, Dec 04, 2025 at 10:19:44AM -0500, Peter Xu wrote:
> > > Add one new file operation, get_mapping_order(). It can be used by file
> > > backends to report mapping order hints.
>
> This seems like a terrible idea. I'll look at it after Plumbers.
Sure, no rush, please feel free to go through discussion in v1 when it
comes, that's where we landed to this API based on suggestions from Jason.
I'm open to other suggestions.
--
Peter Xu
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-12-10 20:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20251204151003.171039-1-peterx@redhat.com>
[not found] ` <20251204151003.171039-3-peterx@redhat.com>
2025-12-04 15:19 ` [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Peter Xu
2025-12-08 9:21 ` Matthew Wilcox
2025-12-10 20:24 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).