From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kvm@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>,
Alex Williamson <alex.williamson@redhat.com>,
Zi Yan <ziy@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>,
Alex Mastro <amastro@fb.com>, Nico Pache <npache@redhat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
Date: Fri, 13 Jun 2025 15:21:39 -0400 [thread overview]
Message-ID: <aEx6Qyl3cgiarXZD@x1.local> (raw)
In-Reply-To: <d6fbee39-a38f-4f94-bffb-938f7be73681@redhat.com>
On Fri, Jun 13, 2025 at 08:09:41PM +0200, David Hildenbrand wrote:
> On 13.06.25 15:41, Peter Xu wrote:
> > This patch enables best-effort mmap() for vfio-pci bars even without
> > MAP_FIXED, so as to utilize huge pfnmaps as much as possible. It should
> > also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA
> > addresses) to start enabling huge pfnmaps on VFIO bars.
> >
> > Here the trick is making sure the MMIO PFNs will be aligned with the VAs
> > allocated from mmap() when !MAP_FIXED, so that whatever returned from
> > mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable
> > for huge pfnmaps as much as possible.
> >
> > To achieve that, a custom vfio_device's get_unmapped_area() for vfio-pci
> > devices is needed.
> >
> > Note that MMIO physical addresses should normally be guaranteed to be
> > always bar-size aligned, hence the bar offset can logically be directly
> > used to do the calculation. However to make it strict and clear (rather
> > than relying on spec details), we still try to fetch the bar's physical
> > addresses from pci_dev.resource[].
> >
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>
> There is likely a
>
> Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
>
> missing?
Would it mean the same if we use the two SoBs like what this patch uses?
I sincerely don't know the difference.. I hope it's fine to show that this
patch was developed together. Please let me know otherwise.
>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > drivers/vfio/pci/vfio_pci.c | 3 ++
> > drivers/vfio/pci/vfio_pci_core.c | 65 ++++++++++++++++++++++++++++++++
> > include/linux/vfio_pci_core.h | 6 +++
> > 3 files changed, 74 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 5ba39f7623bb..d9ae6cdbea28 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -144,6 +144,9 @@ static const struct vfio_device_ops vfio_pci_ops = {
> > .detach_ioas = vfio_iommufd_physical_detach_ioas,
> > .pasid_attach_ioas = vfio_iommufd_physical_pasid_attach_ioas,
> > .pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas,
> > +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
> > + .get_unmapped_area = vfio_pci_core_get_unmapped_area,
> > +#endif
> > };
> > static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 6328c3a05bcd..835bc168f8b7 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1641,6 +1641,71 @@ static unsigned long vma_to_pfn(struct vm_area_struct *vma)
> > return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff;
> > }
> > +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
> > +/*
> > + * Hint function to provide mmap() virtual address candidate so as to be
> > + * able to map huge pfnmaps as much as possible. It is done by aligning
> > + * the VA to the PFN to be mapped in the specific bar.
> > + *
> > + * Note that this function does the minimum check on mmap() parameters to
> > + * make the PFN calculation valid only. The majority of mmap() sanity check
> > + * will be done later in mmap().
> > + */
> > +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device,
> > + struct file *file,
> > + unsigned long addr,
> > + unsigned long len,
> > + unsigned long pgoff,
> > + unsigned long flags)
>
> A very suboptimal way to indent this many parameters; just use two tabs at
> the beginning.
This is the default indentation from Emacs c-mode.
Since this is a VFIO file, I checked the file and looks like there's not
yet a strict rule of indentation across the whole file. I can switch to
two-tabs for sure if nobody else disagrees.
>
> > +{
> > + struct vfio_pci_core_device *vdev =
> > + container_of(device, struct vfio_pci_core_device, vdev);
> > + struct pci_dev *pdev = vdev->pdev;
> > + unsigned long ret, phys_len, req_start, phys_addr;
> > + unsigned int index;
> > +
> > + index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>
> Could do
>
> unsigned int index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>
> at the very top.
Sure.
>
> > +
> > + /* Currently, only bars 0-5 supports huge pfnmap */
> > + if (index >= VFIO_PCI_ROM_REGION_INDEX)
> > + goto fallback;
> > +
> > + /* Bar offset */
> > + req_start = (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1);
> > + phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
> > +
> > + /*
> > + * Make sure we at least can get a valid physical address to do the
> > + * math. If this happens, it will probably fail mmap() later..
> > + */
> > + if (req_start >= phys_len)
> > + goto fallback;
> > +
> > + phys_len = MIN(phys_len, len);
> > + /* Calculate the start of physical address to be mapped */
> > + phys_addr = pci_resource_start(pdev, index) + req_start;
> > +
> > + /* Choose the alignment */
> > + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >= PUD_SIZE) {
> > + ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr,
> > + flags, PUD_SIZE, 0);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + if (phys_len >= PMD_SIZE) {
> > + ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr,
> > + flags, PMD_SIZE, 0);
> > + if (ret)
> > + return ret;
>
> Similar to Jason, I wonder if that logic should reside in the core, and we
> only indicate the maximum page table level we support.
I replied. We can continue the discussion there.
Thanks,
--
Peter Xu
prev parent reply other threads:[~2025-06-13 19:21 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 13:41 [PATCH 0/5] mm/vfio: huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-06-13 13:41 ` [PATCH 1/5] mm: Deduplicate mm_get_unmapped_area() Peter Xu
2025-06-13 14:12 ` Jason Gunthorpe
2025-06-13 14:55 ` Oscar Salvador
2025-06-13 14:58 ` Zi Yan
2025-06-13 15:57 ` Lorenzo Stoakes
2025-06-13 17:00 ` Pedro Falcato
2025-06-13 18:00 ` David Hildenbrand
2025-06-16 8:01 ` David Laight
2025-06-17 21:13 ` Peter Xu
2025-06-13 13:41 ` [PATCH 2/5] mm/hugetlb: Remove prepare_hugepage_range() Peter Xu
2025-06-13 14:12 ` Jason Gunthorpe
2025-06-13 14:59 ` Oscar Salvador
2025-06-13 15:13 ` Zi Yan
2025-06-13 16:24 ` Peter Xu
2025-06-13 18:01 ` David Hildenbrand
2025-06-14 4:11 ` Liam R. Howlett
2025-06-17 21:07 ` Peter Xu
2025-06-13 13:41 ` [PATCH 3/5] mm: Rename __thp_get_unmapped_area to mm_get_unmapped_area_aligned Peter Xu
2025-06-13 14:17 ` Jason Gunthorpe
2025-06-13 15:13 ` Peter Xu
2025-06-13 16:00 ` Jason Gunthorpe
2025-06-13 18:31 ` Peter Xu
2025-06-13 15:19 ` Zi Yan
2025-06-13 18:33 ` Peter Xu
2025-06-13 15:36 ` Lorenzo Stoakes
2025-06-13 18:45 ` Peter Xu
2025-06-13 19:18 ` Lorenzo Stoakes
2025-06-13 20:34 ` Peter Xu
2025-06-14 5:58 ` Lorenzo Stoakes
2025-06-14 5:23 ` Liam R. Howlett
2025-06-16 12:14 ` Jason Gunthorpe
2025-06-16 12:20 ` Lorenzo Stoakes
2025-06-16 12:26 ` Jason Gunthorpe
2025-06-13 13:41 ` [PATCH 4/5] vfio: Introduce vfio_device_ops.get_unmapped_area hook Peter Xu
2025-06-13 14:18 ` Jason Gunthorpe
2025-06-13 18:03 ` David Hildenbrand
2025-06-14 14:46 ` kernel test robot
2025-06-17 15:39 ` Peter Xu
2025-06-17 15:41 ` Jason Gunthorpe
2025-06-17 16:47 ` Peter Xu
2025-06-17 19:39 ` Peter Xu
2025-06-17 19:46 ` Jason Gunthorpe
2025-06-17 20:01 ` Peter Xu
2025-06-17 23:00 ` Jason Gunthorpe
2025-06-17 23:26 ` Peter Xu
2025-06-13 13:41 ` [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-06-13 14:29 ` Jason Gunthorpe
2025-06-13 15:26 ` Peter Xu
2025-06-13 16:09 ` Jason Gunthorpe
2025-06-13 19:15 ` Peter Xu
2025-06-13 23:16 ` Jason Gunthorpe
2025-06-16 22:06 ` Peter Xu
2025-06-16 23:00 ` Jason Gunthorpe
2025-06-17 20:56 ` Peter Xu
2025-06-17 23:18 ` Jason Gunthorpe
2025-06-17 23:36 ` Peter Xu
2025-06-18 16:56 ` Peter Xu
2025-06-18 17:46 ` Jason Gunthorpe
2025-06-18 19:15 ` Peter Xu
2025-06-19 13:58 ` Jason Gunthorpe
2025-06-19 14:55 ` Peter Xu
2025-06-19 18:40 ` Jason Gunthorpe
2025-06-24 20:37 ` Peter Xu
2025-06-24 20:51 ` Peter Xu
2025-06-24 23:40 ` Jason Gunthorpe
2025-06-25 0:48 ` Peter Xu
2025-06-25 13:07 ` Jason Gunthorpe
2025-06-25 17:12 ` Peter Xu
2025-06-25 18:41 ` Jason Gunthorpe
2025-06-25 19:26 ` Peter Xu
2025-06-30 14:05 ` Jason Gunthorpe
2025-07-02 20:58 ` Peter Xu
2025-07-02 23:32 ` Jason Gunthorpe
2025-06-13 17:44 ` Alex Mastro
2025-06-13 18:53 ` Peter Xu
2025-06-13 18:09 ` David Hildenbrand
2025-06-13 19:21 ` Peter Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEx6Qyl3cgiarXZD@x1.local \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=amastro@fb.com \
--cc=david@redhat.com \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).