From: Peter Xu <peterx@redhat.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Anthony Pighin <anthony.pighin@nokia.com>,
linux-kernel@vger.kernel.org, stable@vger.kernel.org,
Kefeng Wang <wangkefeng.wang@huawei.com>,
Vlastimil Babka <vbabka@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
kvm@vger.kernel.org, Matthew Wilcox <willy@infradead.org>,
Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: [PATCH] vfio: Request THP-aligned mmap for device fds
Date: Wed, 17 Jun 2026 10:21:40 -0400 [thread overview]
Message-ID: <ajKtdCN0AlbmBnAj@x1.local> (raw)
In-Reply-To: <20260616163054.77fdb61a@shazbot.org>
On Tue, Jun 16, 2026 at 04:30:54PM -0600, Alex Williamson wrote:
> On Tue, 16 Jun 2026 14:01:29 -0400
> Anthony Pighin <anthony.pighin@nokia.com> wrote:
>
> > VFIO PCI devices support PMD-sized page table entries for BAR mappings
> > via their huge_fault handler (vfio_pci_mmap_huge_fault). However, the
> > VFIO device file_operations never provided a get_unmapped_area callback
> > to request PMD-aligned virtual address placement from the mmap address
> > allocator.
> >
> > Before commit 34d7cf637c43 ("mm: don't try THP alignment for FS without
> > get_unmapped_area"), this was masked by a bug introduced in commit
> > ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") which
> > inadvertently applied THP alignment to all file-backed mappings,
> > regardless of whether they provided a get_unmapped_area callback.
> >
> > When commit 34d7cf637c43 ("mm: don't try THP alignment for FS without
> > get_unmapped_area") correctly restricted THP alignment to anonymous
> > mappings and files that explicitly opt in via get_unmapped_area, VFIO BAR
> > mappings lost their PMD-aligned placement. Since the huge_fault handler
> > requires both the VMA start address and the physical PFN to be
> > PMD-aligned, unaligned VMAs force a fallback to 4KB page faults.
> >
> > For example, a 2GiB BAR results in 524,288 individual page faults
> > instead of 1,024 PMD-sized faults, increasing the VFIO_IOMMU_MAP_DMA
> > pinning time by orders of magnitude -- a regression directly visible to
> > KVM guests during PCI device initialization.
> >
> > Fix this by providing a get_unmapped_area callback in vfio_device_fops,
> > following the same pattern used by ext4, xfs, btrfs, fuse, and other
> > subsystems that benefit from THP-aligned placement.
>
> The trouble is that PMD alignment isn't right either, your 1024 PMD
> faults on a 2GiB BAR would be 2 faults on x86_64 with PUD mappings.
> QEMU has forced the alignment to make it optimal for some time[1], so
> there are userspace VMM options. Seems like you were previously
> getting lucky.
>
> Peter Xu was working on a more comprehensive solution[2] late last
> year, but it seems there was an objection to the
> file_operations.get_mapping_order() proposal before Plumbers and the
> thread hasn't rekindled.
>
> Gentle bump to Peter and Willy that maybe we could resurrect that
> effort. Thanks,
Yes, since QEMU doesn't need it, it was low priority on my list (also due
to much more downstream works recently, and a lot of things happened).
I can definitely try again.
I'll wait for another 1-2 weeks in case Matthew would like to provide a
better suggestion, otherwise I can send a new version based on what we
have, and we can start from there.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2026-06-17 14:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 18:01 [PATCH] vfio: Request THP-aligned mmap for device fds Anthony Pighin
2026-06-16 22:30 ` Alex Williamson
2026-06-17 14:21 ` Peter Xu [this message]
2026-06-17 18:34 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajKtdCN0AlbmBnAj@x1.local \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=anthony.pighin@nokia.com \
--cc=jgg@ziepe.ca \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=vbabka@kernel.org \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox