From: "Cédric Le Goater" <clg@redhat.com>
To: Peter Xu <peterx@redhat.com>,
kvm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Cc: Jason Gunthorpe <jgg@nvidia.com>, Nico Pache <npache@redhat.com>,
Zi Yan <ziy@nvidia.com>, Alex Mastro <amastro@fb.com>,
David Hildenbrand <david@redhat.com>,
Alex Williamson <alex@shazbot.org>, Zhi Wang <zhiw@nvidia.com>,
David Laight <david.laight.linux@gmail.com>,
Yi Liu <yi.l.liu@intel.com>, Ankit Agrawal <ankita@nvidia.com>,
Kevin Tian <kevin.tian@intel.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 0/4] mm/vfio: huge pfnmaps with !MAP_FIXED mappings
Date: Thu, 4 Dec 2025 19:16:54 +0100 [thread overview]
Message-ID: <e2033095-9bf1-4d9c-9a5b-01148eaffc30@redhat.com> (raw)
In-Reply-To: <20251204151003.171039-1-peterx@redhat.com>
On 12/4/25 16:09, Peter Xu wrote:
> This series is based on v6.18. It allows mmap(!MAP_FIXED) to work with
> huge pfnmaps with best effort. Meanwhile, it enables it for vfio-pci as
> the first user.
>
> v1: https://lore.kernel.org/r/20250613134111.469884-1-peterx@redhat.com
>
> A changelog may not apply because all the patches were rewrote based on a
> new interface this v2 introduced. Hence omitted.
>
> In this version, a new file operation, get_mapping_order(), is introduced
> (based on discussion with Jason on v1) to minimize the code needed for
> drivers to implement this. It also helps avoid exporting any mm functions.
> One can refer to the discussion in v1 for more information.
>
> Currently, get_mapping_order() API is define as:
>
> int (*get_mapping_order)(struct file *file, unsigned long pgoff, size_t len);
>
> The first argument is the file pointer, the 2nd+3rd are the pgoff+len
> specified from a mmap() request. The driver can use this interface to
> opt-in providing mapping order hints to core mm on VA allocations for the
> range of the file specified. I kept the interface as simple for now, so
> that core mm will always do the alignment with pgoff assuming that would
> always work. The driver can only report the order from pgoff+len, which
> will be used to do the alignment.
>
> Before this series, an userapp in most cases need to be modified to benefit
> from huge mappings to provide huge size aligned VA using MAP_FIXED. After
> this series, the userapp can benefit from huge pfnmap automatically after
> the kernel upgrades, with no userspace modifications.
>
> It's still best-effort, because the auto-alignment will require a larger VA
> range to be allocated via the per-arch allocator, hence if the huge-mapping
> aligned VA cannot be allocated then it'll still fallback to small mappings
> like before. However that's from theory POV: in reality I don't yet know
> when it'll fail especially when on a 64bits system.
>
> So far, only vfio-pci is supported. But the logic should be applicable to
> all the drivers that support or will support huge pfnmaps. I've copied
> some more people in this version too from hardware perspective.
>
> For testings:
>
> - checkpatch.pl
> - cross build harness
> - unit test that I got from Alex [1], checking mmap() alignments on a QEMU
> instance with an 128MB bar.
>
> Checking the alignments look all sane with mmap(!MAP_FIXED), and huge
> mappings properly installed. I didn't observe anything wrong.
>
> I currently lack larger bars to test PUD sizes. Please kindly report if
> one can run this with 1G+ bars and hit issues.
LGTM, with a 32G BAR :
Using device 0000:02:00.0 in IOMMU group 27
Device 0000:02:00.0 supports 9 regions, 5 irqs
[BAR0]: size 0x1000000, order 24, offset 0x0, flags 0xf
Testing BAR0, require at least 21 bit alignment
[PASS] Minimum alignment 21
Testing random offset
[PASS] Random offset
Testing random size
[PASS] Random size
[BAR1]: size 0x800000000, order 35, offset 0x10000000000, flags 0x7
Testing BAR1, require at least 30 bit alignment
[PASS] Minimum alignment 31
Testing random offset
[PASS] Random offset
Testing random size
[PASS] Random size
[BAR3]: size 0x2000000, order 25, offset 0x30000000000, flags 0x7
Testing BAR3, require at least 21 bit alignment
[PASS] Minimum alignment 21
Testing random offset
[PASS] Random offset
Testing random size
[PASS] Random size
C.
>
> Alex Mastro: thanks for the testing offered in v1, but since this series
> was rewritten, a re-test will be needed. I hence didn't collect the T-b.
>
> Comments welcomed, thanks.
>
> [1] https://github.com/awilliam/tests/blob/vfio-pci-device-map-alignment/vfio-pci-device-map-alignment.c
>
> Peter Xu (4):
> mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment
> mm: Add file_operations.get_mapping_order()
> vfio: Introduce vfio_device_ops.get_mapping_order hook
> vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
>
> Documentation/filesystems/vfs.rst | 4 +++
> drivers/vfio/pci/vfio_pci.c | 1 +
> drivers/vfio/pci/vfio_pci_core.c | 49 ++++++++++++++++++++++++++
> drivers/vfio/vfio_main.c | 14 ++++++++
> include/linux/fs.h | 1 +
> include/linux/huge_mm.h | 5 +--
> include/linux/vfio.h | 5 +++
> include/linux/vfio_pci_core.h | 2 ++
> mm/huge_memory.c | 7 ++--
> mm/mmap.c | 58 +++++++++++++++++++++++++++----
> 10 files changed, 135 insertions(+), 11 deletions(-)
>
next prev parent reply other threads:[~2025-12-04 18:17 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 15:09 [PATCH v2 0/4] mm/vfio: huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-12-04 15:10 ` [PATCH v2 1/4] mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment Peter Xu
2025-12-04 15:10 ` [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Peter Xu
2025-12-04 15:19 ` Peter Xu
2025-12-08 9:21 ` Matthew Wilcox
2025-12-10 20:24 ` Peter Xu
2025-12-07 16:21 ` Jason Gunthorpe
2025-12-10 20:23 ` Peter Xu
2025-12-04 15:10 ` [PATCH v2 3/4] vfio: Introduce vfio_device_ops.get_mapping_order hook Peter Xu
2025-12-04 15:10 ` [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-12-05 4:33 ` kernel test robot
2025-12-05 7:45 ` kernel test robot
2025-12-07 16:26 ` Jason Gunthorpe
2025-12-10 20:43 ` Peter Xu
2025-12-08 3:11 ` Alex Mastro
2025-12-04 18:16 ` Cédric Le Goater [this message]
2025-12-07 9:13 ` [PATCH v2 0/4] mm/vfio: " Alex Mastro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2033095-9bf1-4d9c-9a5b-01148eaffc30@redhat.com \
--to=clg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=david.laight.linux@gmail.com \
--cc=david@redhat.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=peterx@redhat.com \
--cc=yi.l.liu@intel.com \
--cc=zhiw@nvidia.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).