All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Anthony Pighin <anthony.pighin@nokia.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	kvm@vger.kernel.org, Matthew Wilcox <willy@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: [PATCH] vfio: Request THP-aligned mmap for device fds
Date: Wed, 17 Jun 2026 10:21:40 -0400	[thread overview]
Message-ID: <ajKtdCN0AlbmBnAj@x1.local> (raw)
In-Reply-To: <20260616163054.77fdb61a@shazbot.org>

On Tue, Jun 16, 2026 at 04:30:54PM -0600, Alex Williamson wrote:
> On Tue, 16 Jun 2026 14:01:29 -0400
> Anthony Pighin <anthony.pighin@nokia.com> wrote:
> 
> > VFIO PCI devices support PMD-sized page table entries for BAR mappings
> > via their huge_fault handler (vfio_pci_mmap_huge_fault).  However, the
> > VFIO device file_operations never provided a get_unmapped_area callback
> > to request PMD-aligned virtual address placement from the mmap address
> > allocator.
> > 
> > Before commit 34d7cf637c43 ("mm: don't try THP alignment for FS without
> > get_unmapped_area"), this was masked by a bug introduced in commit
> > ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") which
> > inadvertently applied THP alignment to all file-backed mappings,
> > regardless of whether they provided a get_unmapped_area callback.
> > 
> > When commit 34d7cf637c43 ("mm: don't try THP alignment for FS without
> > get_unmapped_area") correctly restricted THP alignment to anonymous
> > mappings and files that explicitly opt in via get_unmapped_area, VFIO BAR
> > mappings lost their PMD-aligned placement.  Since the huge_fault handler
> > requires both the VMA start address and the physical PFN to be
> > PMD-aligned, unaligned VMAs force a fallback to 4KB page faults.
> > 
> > For example, a 2GiB BAR results in 524,288 individual page faults
> > instead of 1,024 PMD-sized faults, increasing the VFIO_IOMMU_MAP_DMA
> > pinning time by orders of magnitude -- a regression directly visible to
> > KVM guests during PCI device initialization.
> > 
> > Fix this by providing a get_unmapped_area callback in vfio_device_fops,
> > following the same pattern used by ext4, xfs, btrfs, fuse, and other
> > subsystems that benefit from THP-aligned placement.
> 
> The trouble is that PMD alignment isn't right either, your 1024 PMD
> faults on a 2GiB BAR would be 2 faults on x86_64 with PUD mappings.
> QEMU has forced the alignment to make it optimal for some time[1], so
> there are userspace VMM options.  Seems like you were previously
> getting lucky.
> 
> Peter Xu was working on a more comprehensive solution[2] late last
> year, but it seems there was an objection to the
> file_operations.get_mapping_order() proposal before Plumbers and the
> thread hasn't rekindled.
> 
> Gentle bump to Peter and Willy that maybe we could resurrect that
> effort.  Thanks,

Yes, since QEMU doesn't need it, it was low priority on my list (also due
to much more downstream works recently, and a lot of things happened).

I can definitely try again.

I'll wait for another 1-2 weeks in case Matthew would like to provide a
better suggestion, otherwise I can send a new version based on what we
have, and we can start from there.

Thanks,

-- 
Peter Xu


  reply	other threads:[~2026-06-17 14:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 18:01 [PATCH] vfio: Request THP-aligned mmap for device fds Anthony Pighin
2026-06-16 22:30 ` Alex Williamson
2026-06-17 14:21   ` Peter Xu [this message]
2026-06-17 18:34     ` Matthew Wilcox
2026-06-17 19:29       ` Jason Gunthorpe
2026-06-18 14:55         ` Lorenzo Stoakes
2026-06-18 15:04           ` Matthew Wilcox
2026-06-18 15:30             ` Jason Gunthorpe
2026-06-18 15:56               ` Lorenzo Stoakes
2026-06-18 15:28           ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajKtdCN0AlbmBnAj@x1.local \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=anthony.pighin@nokia.com \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@kernel.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.