From: Peter Xu <peterx@redhat.com>
To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Jason Gunthorpe <jgg@nvidia.com>, Nico Pache <npache@redhat.com>,
Zi Yan <ziy@nvidia.com>, Alex Mastro <amastro@fb.com>,
David Hildenbrand <david@redhat.com>,
Alex Williamson <alex@shazbot.org>, Zhi Wang <zhiw@nvidia.com>,
David Laight <david.laight.linux@gmail.com>,
Yi Liu <yi.l.liu@intel.com>, Ankit Agrawal <ankita@nvidia.com>,
Kevin Tian <kevin.tian@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Jonathan Corbet <corbet@lwn.net>,
linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v2 2/4] mm: Add file_operations.get_mapping_order()
Date: Thu, 4 Dec 2025 10:19:44 -0500 [thread overview]
Message-ID: <aTGmkHsRSsnneW0G@x1.local> (raw)
In-Reply-To: <20251204151003.171039-3-peterx@redhat.com>
I forgot to copy mm/fs maintainers for the 1st/2nd patches in this series,
my apologies. Whole series can be found here:
https://lore.kernel.org/r/20251204151003.171039-1-peterx@redhat.com
I'll modify the cc list when repost.
Thanks,
On Thu, Dec 04, 2025 at 10:10:01AM -0500, Peter Xu wrote:
> Add one new file operation, get_mapping_order(). It can be used by file
> backends to report mapping order hints.
>
> By default, Linux assumed we will map in PAGE_SIZE chunks. With this hint,
> the driver can report the possibility of mapping chunks that are larger
> than PAGE_SIZE. Then, the VA allocator will try to use that as alignment
> when allocating the VA ranges.
>
> This is useful because when chunks to be mapped are larger than PAGE_SIZE,
> VA alignment matters and it needs to be aligned with the size of the chunk
> to be mapped.
>
> Said that, no matter what is the alignment used for the VA allocation, the
> driver can still decide which size to map the chunks. It is also not an
> issue if it keeps mapping in PAGE_SIZE.
>
> get_mapping_order() is defined to take three parameters. Besides the 1st
> parameter which will be the file object pointer, the 2nd + 3rd parameters
> being the pgoff + size of the mmap() request. Its retval is defined as the
> order, which must be non-negative to enable the alignment. When zero is
> returned, it should behave like when the hint is not provided, IOW,
> alignment will still be PAGE_SIZE.
>
> When the order is too big, ignore the hint. Normally drivers are trusted,
> so it's more of an extra layer of safety measure.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> Documentation/filesystems/vfs.rst | 4 +++
> include/linux/fs.h | 1 +
> mm/mmap.c | 59 +++++++++++++++++++++++++++----
> 3 files changed, 57 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index 4f13b01e42eb5..b707ddbebbf52 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -1069,6 +1069,7 @@ This describes how the VFS can manipulate an open file. As of kernel
> int (*fasync) (int, struct file *, int);
> int (*lock) (struct file *, int, struct file_lock *);
> unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
> + int (*get_mapping_order)(struct file *, unsigned long, size_t);
> int (*check_flags)(int);
> int (*flock) (struct file *, int, struct file_lock *);
> ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
> @@ -1165,6 +1166,9 @@ otherwise noted.
> ``get_unmapped_area``
> called by the mmap(2) system call
>
> +``get_mapping_order``
> + called by the mmap(2) system call to get mapping order hint
> +
> ``check_flags``
> called by the fcntl(2) system call for F_SETFL command
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index dd3b57cfadeeb..5ba373576bfe5 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2287,6 +2287,7 @@ struct file_operations {
> int (*fasync) (int, struct file *, int);
> int (*lock) (struct file *, int, struct file_lock *);
> unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
> + int (*get_mapping_order)(struct file *file, unsigned long pgoff, size_t len);
> int (*check_flags)(int);
> int (*flock) (struct file *, int, struct file_lock *);
> ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 8fa397a18252e..be3dd0623f00c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -808,6 +808,33 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *fi
> return arch_get_unmapped_area(filp, addr, len, pgoff, flags, vm_flags);
> }
>
> +static inline bool file_has_mmap_order_hint(struct file *file)
> +{
> + return file && file->f_op && file->f_op->get_mapping_order;
> +}
> +
> +static inline bool
> +mmap_should_align(struct file *file, unsigned long addr, unsigned long len)
> +{
> + /* When THP not enabled at all, skip */
> + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> + return false;
> +
> + /* Never try any alignment if the mmap() address hint is provided */
> + if (addr)
> + return false;
> +
> + /* Anonymous THP could use some better alignment when len aligned */
> + if (!file)
> + return IS_ALIGNED(len, PMD_SIZE);
> +
> + /*
> + * It's a file mapping, no address hint provided by caller, try any
> + * alignment if the file backend would provide a hint
> + */
> + return file_has_mmap_order_hint(file);
> +}
> +
> unsigned long
> __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
> @@ -815,8 +842,9 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> unsigned long (*get_area)(struct file *, unsigned long,
> unsigned long, unsigned long, unsigned long)
> = NULL;
> -
> unsigned long error = arch_mmap_check(addr, len, flags);
> + unsigned long align;
> +
> if (error)
> return error;
>
> @@ -841,13 +869,30 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>
> if (get_area) {
> addr = get_area(file, addr, len, pgoff, flags);
> - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && !file
> - && !addr /* no hint */
> - && IS_ALIGNED(len, PMD_SIZE)) {
> - /* Ensures that larger anonymous mappings are THP aligned. */
> + } else if (mmap_should_align(file, addr, len)) {
> + if (file_has_mmap_order_hint(file)) {
> + int order;
> + /*
> + * Allow driver to opt-in on the order hint.
> + *
> + * Sanity check on the order returned. Treating
> + * either negative or too big order to be invalid,
> + * where alignment will be skipped.
> + */
> + order = file->f_op->get_mapping_order(file, pgoff, len);
> + if (order < 0)
> + order = 0;
> + if (check_shl_overflow(PAGE_SIZE, order, &align))
> + /* No alignment applied */
> + align = PAGE_SIZE;
> + } else {
> + /* Default alignment for anonymous THPs */
> + align = PMD_SIZE;
> + }
> +
> addr = thp_get_unmapped_area_vmflags(file, addr, len,
> - pgoff, flags, PMD_SIZE,
> - vm_flags);
> + pgoff, flags,
> + align, vm_flags);
> } else {
> addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
> pgoff, flags, vm_flags);
> --
> 2.50.1
>
--
Peter Xu
next parent reply other threads:[~2025-12-04 15:19 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251204151003.171039-1-peterx@redhat.com>
[not found] ` <20251204151003.171039-3-peterx@redhat.com>
2025-12-04 15:19 ` Peter Xu [this message]
2025-12-08 9:21 ` [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Matthew Wilcox
2025-12-10 20:24 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTGmkHsRSsnneW0G@x1.local \
--to=peterx@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=david.laight.linux@gmail.com \
--cc=david@kernel.org \
--cc=david@redhat.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=npache@redhat.com \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yi.l.liu@intel.com \
--cc=zhiw@nvidia.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).