All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Zi Yan <ziy@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Alex Mastro <amastro@fb.com>, Nico Pache <npache@redhat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
Date: Fri, 13 Jun 2025 15:21:39 -0400	[thread overview]
Message-ID: <aEx6Qyl3cgiarXZD@x1.local> (raw)
In-Reply-To: <d6fbee39-a38f-4f94-bffb-938f7be73681@redhat.com>

On Fri, Jun 13, 2025 at 08:09:41PM +0200, David Hildenbrand wrote:
> On 13.06.25 15:41, Peter Xu wrote:
> > This patch enables best-effort mmap() for vfio-pci bars even without
> > MAP_FIXED, so as to utilize huge pfnmaps as much as possible.  It should
> > also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA
> > addresses) to start enabling huge pfnmaps on VFIO bars.
> > 
> > Here the trick is making sure the MMIO PFNs will be aligned with the VAs
> > allocated from mmap() when !MAP_FIXED, so that whatever returned from
> > mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable
> > for huge pfnmaps as much as possible.
> > 
> > To achieve that, a custom vfio_device's get_unmapped_area() for vfio-pci
> > devices is needed.
> > 
> > Note that MMIO physical addresses should normally be guaranteed to be
> > always bar-size aligned, hence the bar offset can logically be directly
> > used to do the calculation.  However to make it strict and clear (rather
> > than relying on spec details), we still try to fetch the bar's physical
> > addresses from pci_dev.resource[].
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> There is likely a
> 
> Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
> 
> missing?

Would it mean the same if we use the two SoBs like what this patch uses?
I sincerely don't know the difference..  I hope it's fine to show that this
patch was developed together.  Please let me know otherwise.

> 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   drivers/vfio/pci/vfio_pci.c      |  3 ++
> >   drivers/vfio/pci/vfio_pci_core.c | 65 ++++++++++++++++++++++++++++++++
> >   include/linux/vfio_pci_core.h    |  6 +++
> >   3 files changed, 74 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 5ba39f7623bb..d9ae6cdbea28 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -144,6 +144,9 @@ static const struct vfio_device_ops vfio_pci_ops = {
> >   	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
> >   	.pasid_attach_ioas	= vfio_iommufd_physical_pasid_attach_ioas,
> >   	.pasid_detach_ioas	= vfio_iommufd_physical_pasid_detach_ioas,
> > +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
> > +	.get_unmapped_area	= vfio_pci_core_get_unmapped_area,
> > +#endif
> >   };
> >   static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 6328c3a05bcd..835bc168f8b7 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1641,6 +1641,71 @@ static unsigned long vma_to_pfn(struct vm_area_struct *vma)
> >   	return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff;
> >   }
> > +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
> > +/*
> > + * Hint function to provide mmap() virtual address candidate so as to be
> > + * able to map huge pfnmaps as much as possible.  It is done by aligning
> > + * the VA to the PFN to be mapped in the specific bar.
> > + *
> > + * Note that this function does the minimum check on mmap() parameters to
> > + * make the PFN calculation valid only. The majority of mmap() sanity check
> > + * will be done later in mmap().
> > + */
> > +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device,
> > +					      struct file *file,
> > +					      unsigned long addr,
> > +					      unsigned long len,
> > +					      unsigned long pgoff,
> > +					      unsigned long flags)
> 
> A very suboptimal way to indent this many parameters; just use two tabs at
> the beginning.

This is the default indentation from Emacs c-mode.

Since this is a VFIO file, I checked the file and looks like there's not
yet a strict rule of indentation across the whole file.  I can switch to
two-tabs for sure if nobody else disagrees.

> 
> > +{
> > +	struct vfio_pci_core_device *vdev =
> > +		container_of(device, struct vfio_pci_core_device, vdev);
> > +	struct pci_dev *pdev = vdev->pdev;
> > +	unsigned long ret, phys_len, req_start, phys_addr;
> > +	unsigned int index;
> > +
> > +	index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
> 
> Could do
> 
> unsigned int index =  pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
> 
> at the very top.

Sure.

> 
> > +
> > +	/* Currently, only bars 0-5 supports huge pfnmap */
> > +	if (index >= VFIO_PCI_ROM_REGION_INDEX)
> > +		goto fallback;
> > +
> > +	/* Bar offset */
> > +	req_start = (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1);
> > +	phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
> > +
> > +	/*
> > +	 * Make sure we at least can get a valid physical address to do the
> > +	 * math.  If this happens, it will probably fail mmap() later..
> > +	 */
> > +	if (req_start >= phys_len)
> > +		goto fallback;
> > +
> > +	phys_len = MIN(phys_len, len);
> > +	/* Calculate the start of physical address to be mapped */
> > +	phys_addr = pci_resource_start(pdev, index) + req_start;
> > +
> > +	/* Choose the alignment */
> > +	if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >= PUD_SIZE) {
> > +		ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr,
> > +						   flags, PUD_SIZE, 0);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	if (phys_len >= PMD_SIZE) {
> > +		ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr,
> > +						   flags, PMD_SIZE, 0);
> > +		if (ret)
> > +			return ret;
> 
> Similar to Jason, I wonder if that logic should reside in the core, and we
> only indicate the maximum page table level we support.

I replied.  We can continue the discussion there.

Thanks,

-- 
Peter Xu


      reply	other threads:[~2025-06-13 19:21 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-13 13:41 [PATCH 0/5] mm/vfio: huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-06-13 13:41 ` [PATCH 1/5] mm: Deduplicate mm_get_unmapped_area() Peter Xu
2025-06-13 14:12   ` Jason Gunthorpe
2025-06-13 14:55   ` Oscar Salvador
2025-06-13 14:58   ` Zi Yan
2025-06-13 15:57   ` Lorenzo Stoakes
2025-06-13 17:00     ` Pedro Falcato
2025-06-13 18:00   ` David Hildenbrand
2025-06-16  8:01   ` David Laight
2025-06-17 21:13     ` Peter Xu
2025-06-13 13:41 ` [PATCH 2/5] mm/hugetlb: Remove prepare_hugepage_range() Peter Xu
2025-06-13 14:12   ` Jason Gunthorpe
2025-06-13 14:59   ` Oscar Salvador
2025-06-13 15:13   ` Zi Yan
2025-06-13 16:24     ` Peter Xu
2025-06-13 18:01       ` David Hildenbrand
2025-06-14  4:11   ` Liam R. Howlett
2025-06-17 21:07     ` Peter Xu
2025-06-13 13:41 ` [PATCH 3/5] mm: Rename __thp_get_unmapped_area to mm_get_unmapped_area_aligned Peter Xu
2025-06-13 14:17   ` Jason Gunthorpe
2025-06-13 15:13     ` Peter Xu
2025-06-13 16:00       ` Jason Gunthorpe
2025-06-13 18:31         ` Peter Xu
2025-06-13 15:19   ` Zi Yan
2025-06-13 18:33     ` Peter Xu
2025-06-13 15:36   ` Lorenzo Stoakes
2025-06-13 18:45     ` Peter Xu
2025-06-13 19:18       ` Lorenzo Stoakes
2025-06-13 20:34         ` Peter Xu
2025-06-14  5:58           ` Lorenzo Stoakes
2025-06-14  5:23   ` Liam R. Howlett
2025-06-16 12:14     ` Jason Gunthorpe
2025-06-16 12:20       ` Lorenzo Stoakes
2025-06-16 12:26         ` Jason Gunthorpe
2025-06-13 13:41 ` [PATCH 4/5] vfio: Introduce vfio_device_ops.get_unmapped_area hook Peter Xu
2025-06-13 14:18   ` Jason Gunthorpe
2025-06-13 18:03   ` David Hildenbrand
2025-06-14 14:46   ` kernel test robot
2025-06-17 15:39     ` Peter Xu
2025-06-17 15:41       ` Jason Gunthorpe
2025-06-17 16:47         ` Peter Xu
2025-06-17 19:39           ` Peter Xu
2025-06-17 19:46             ` Jason Gunthorpe
2025-06-17 20:01               ` Peter Xu
2025-06-17 23:00                 ` Jason Gunthorpe
2025-06-17 23:26                   ` Peter Xu
2025-06-13 13:41 ` [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-06-13 14:29   ` Jason Gunthorpe
2025-06-13 15:26     ` Peter Xu
2025-06-13 16:09       ` Jason Gunthorpe
2025-06-13 19:15         ` Peter Xu
2025-06-13 23:16           ` Jason Gunthorpe
2025-06-16 22:06             ` Peter Xu
2025-06-16 23:00               ` Jason Gunthorpe
2025-06-17 20:56                 ` Peter Xu
2025-06-17 23:18                   ` Jason Gunthorpe
2025-06-17 23:36                     ` Peter Xu
2025-06-18 16:56                       ` Peter Xu
2025-06-18 17:46                         ` Jason Gunthorpe
2025-06-18 19:15                           ` Peter Xu
2025-06-19 13:58                             ` Jason Gunthorpe
2025-06-19 14:55                               ` Peter Xu
2025-06-19 18:40                                 ` Jason Gunthorpe
2025-06-24 20:37                                   ` Peter Xu
2025-06-24 20:51                                     ` Peter Xu
2025-06-24 23:40                                     ` Jason Gunthorpe
2025-06-25  0:48                                       ` Peter Xu
2025-06-25 13:07                                         ` Jason Gunthorpe
2025-06-25 17:12                                           ` Peter Xu
2025-06-25 18:41                                             ` Jason Gunthorpe
2025-06-25 19:26                                               ` Peter Xu
2025-06-30 14:05                                                 ` Jason Gunthorpe
2025-07-02 20:58                                                   ` Peter Xu
2025-07-02 23:32                                                     ` Jason Gunthorpe
2025-06-13 17:44   ` Alex Mastro
2025-06-13 18:53     ` Peter Xu
2025-06-13 18:09   ` David Hildenbrand
2025-06-13 19:21     ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aEx6Qyl3cgiarXZD@x1.local \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=amastro@fb.com \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.