Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Zi Yan <ziy@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Alex Mastro <amastro@fb.com>, Nico Pache <npache@redhat.com>
Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
Date: Fri, 13 Jun 2025 20:09:41 +0200	[thread overview]
Message-ID: <d6fbee39-a38f-4f94-bffb-938f7be73681@redhat.com> (raw)
In-Reply-To: <20250613134111.469884-6-peterx@redhat.com>

On 13.06.25 15:41, Peter Xu wrote:
> This patch enables best-effort mmap() for vfio-pci bars even without
> MAP_FIXED, so as to utilize huge pfnmaps as much as possible.  It should
> also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA
> addresses) to start enabling huge pfnmaps on VFIO bars.
> 
> Here the trick is making sure the MMIO PFNs will be aligned with the VAs
> allocated from mmap() when !MAP_FIXED, so that whatever returned from
> mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable
> for huge pfnmaps as much as possible.
> 
> To achieve that, a custom vfio_device's get_unmapped_area() for vfio-pci
> devices is needed.
> 
> Note that MMIO physical addresses should normally be guaranteed to be
> always bar-size aligned, hence the bar offset can logically be directly
> used to do the calculation.  However to make it strict and clear (rather
> than relying on spec details), we still try to fetch the bar's physical
> addresses from pci_dev.resource[].
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

There is likely a

Co-developed-by: Alex Williamson <alex.williamson@redhat.com>

missing?

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   drivers/vfio/pci/vfio_pci.c      |  3 ++
>   drivers/vfio/pci/vfio_pci_core.c | 65 ++++++++++++++++++++++++++++++++
>   include/linux/vfio_pci_core.h    |  6 +++
>   3 files changed, 74 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 5ba39f7623bb..d9ae6cdbea28 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -144,6 +144,9 @@ static const struct vfio_device_ops vfio_pci_ops = {
>   	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
>   	.pasid_attach_ioas	= vfio_iommufd_physical_pasid_attach_ioas,
>   	.pasid_detach_ioas	= vfio_iommufd_physical_pasid_detach_ioas,
> +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
> +	.get_unmapped_area	= vfio_pci_core_get_unmapped_area,
> +#endif
>   };
>   
>   static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 6328c3a05bcd..835bc168f8b7 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1641,6 +1641,71 @@ static unsigned long vma_to_pfn(struct vm_area_struct *vma)
>   	return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff;
>   }
>   
> +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
> +/*
> + * Hint function to provide mmap() virtual address candidate so as to be
> + * able to map huge pfnmaps as much as possible.  It is done by aligning
> + * the VA to the PFN to be mapped in the specific bar.
> + *
> + * Note that this function does the minimum check on mmap() parameters to
> + * make the PFN calculation valid only. The majority of mmap() sanity check
> + * will be done later in mmap().
> + */
> +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device,
> +					      struct file *file,
> +					      unsigned long addr,
> +					      unsigned long len,
> +					      unsigned long pgoff,
> +					      unsigned long flags)

A very suboptimal way to indent this many parameters; just use two tabs 
at the beginning.

> +{
> +	struct vfio_pci_core_device *vdev =
> +		container_of(device, struct vfio_pci_core_device, vdev);
> +	struct pci_dev *pdev = vdev->pdev;
> +	unsigned long ret, phys_len, req_start, phys_addr;
> +	unsigned int index;
> +
> +	index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);

Could do

unsigned int index =  pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);

at the very top.

> +
> +	/* Currently, only bars 0-5 supports huge pfnmap */
> +	if (index >= VFIO_PCI_ROM_REGION_INDEX)
> +		goto fallback;
> +
> +	/* Bar offset */
> +	req_start = (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1);
> +	phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
> +
> +	/*
> +	 * Make sure we at least can get a valid physical address to do the
> +	 * math.  If this happens, it will probably fail mmap() later..
> +	 */
> +	if (req_start >= phys_len)
> +		goto fallback;
> +
> +	phys_len = MIN(phys_len, len);
> +	/* Calculate the start of physical address to be mapped */
> +	phys_addr = pci_resource_start(pdev, index) + req_start;
> +
> +	/* Choose the alignment */
> +	if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >= PUD_SIZE) {
> +		ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr,
> +						   flags, PUD_SIZE, 0);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (phys_len >= PMD_SIZE) {
> +		ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr,
> +						   flags, PMD_SIZE, 0);
> +		if (ret)
> +			return ret;

Similar to Jason, I wonder if that logic should reside in the core, and 
we only indicate the maximum page table level we support.

			   unsigned int order)
>   {
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index fbb472dd99b3..e59699e01901 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -119,6 +119,12 @@ ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
>   		size_t count, loff_t *ppos);
>   ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
>   		size_t count, loff_t *ppos);
> +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device,
> +					      struct file *file,
> +					      unsigned long addr,
> +					      unsigned long len,
> +					      unsigned long pgoff,
> +					      unsigned long flags);

Dito.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-06-13 18:09 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-13 13:41 [PATCH 0/5] mm/vfio: huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-06-13 13:41 ` [PATCH 1/5] mm: Deduplicate mm_get_unmapped_area() Peter Xu
2025-06-13 14:12   ` Jason Gunthorpe
2025-06-13 14:55   ` Oscar Salvador
2025-06-13 14:58   ` Zi Yan
2025-06-13 15:57   ` Lorenzo Stoakes
2025-06-13 17:00     ` Pedro Falcato
2025-06-13 18:00   ` David Hildenbrand
2025-06-16  8:01   ` David Laight
2025-06-17 21:13     ` Peter Xu
2025-06-13 13:41 ` [PATCH 2/5] mm/hugetlb: Remove prepare_hugepage_range() Peter Xu
2025-06-13 14:12   ` Jason Gunthorpe
2025-06-13 14:59   ` Oscar Salvador
2025-06-13 15:13   ` Zi Yan
2025-06-13 16:24     ` Peter Xu
2025-06-13 18:01       ` David Hildenbrand
2025-06-14  4:11   ` Liam R. Howlett
2025-06-17 21:07     ` Peter Xu
2025-06-13 13:41 ` [PATCH 3/5] mm: Rename __thp_get_unmapped_area to mm_get_unmapped_area_aligned Peter Xu
2025-06-13 14:17   ` Jason Gunthorpe
2025-06-13 15:13     ` Peter Xu
2025-06-13 16:00       ` Jason Gunthorpe
2025-06-13 18:31         ` Peter Xu
2025-06-13 15:19   ` Zi Yan
2025-06-13 18:33     ` Peter Xu
2025-06-13 15:36   ` Lorenzo Stoakes
2025-06-13 18:45     ` Peter Xu
2025-06-13 19:18       ` Lorenzo Stoakes
2025-06-13 20:34         ` Peter Xu
2025-06-14  5:58           ` Lorenzo Stoakes
2025-06-14  5:23   ` Liam R. Howlett
2025-06-16 12:14     ` Jason Gunthorpe
2025-06-16 12:20       ` Lorenzo Stoakes
2025-06-16 12:26         ` Jason Gunthorpe
2025-06-13 13:41 ` [PATCH 4/5] vfio: Introduce vfio_device_ops.get_unmapped_area hook Peter Xu
2025-06-13 14:18   ` Jason Gunthorpe
2025-06-13 18:03   ` David Hildenbrand
2025-06-14 14:46   ` kernel test robot
2025-06-17 15:39     ` Peter Xu
2025-06-17 15:41       ` Jason Gunthorpe
2025-06-17 16:47         ` Peter Xu
2025-06-17 19:39           ` Peter Xu
2025-06-17 19:46             ` Jason Gunthorpe
2025-06-17 20:01               ` Peter Xu
2025-06-17 23:00                 ` Jason Gunthorpe
2025-06-17 23:26                   ` Peter Xu
2025-06-13 13:41 ` [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-06-13 14:29   ` Jason Gunthorpe
2025-06-13 15:26     ` Peter Xu
2025-06-13 16:09       ` Jason Gunthorpe
2025-06-13 19:15         ` Peter Xu
2025-06-13 23:16           ` Jason Gunthorpe
2025-06-16 22:06             ` Peter Xu
2025-06-16 23:00               ` Jason Gunthorpe
2025-06-17 20:56                 ` Peter Xu
2025-06-17 23:18                   ` Jason Gunthorpe
2025-06-17 23:36                     ` Peter Xu
2025-06-18 16:56                       ` Peter Xu
2025-06-18 17:46                         ` Jason Gunthorpe
2025-06-18 19:15                           ` Peter Xu
2025-06-19 13:58                             ` Jason Gunthorpe
2025-06-19 14:55                               ` Peter Xu
2025-06-19 18:40                                 ` Jason Gunthorpe
2025-06-24 20:37                                   ` Peter Xu
2025-06-24 20:51                                     ` Peter Xu
2025-06-24 23:40                                     ` Jason Gunthorpe
2025-06-25  0:48                                       ` Peter Xu
2025-06-25 13:07                                         ` Jason Gunthorpe
2025-06-25 17:12                                           ` Peter Xu
2025-06-25 18:41                                             ` Jason Gunthorpe
2025-06-25 19:26                                               ` Peter Xu
2025-06-30 14:05                                                 ` Jason Gunthorpe
2025-07-02 20:58                                                   ` Peter Xu
2025-07-02 23:32                                                     ` Jason Gunthorpe
2025-06-13 17:44   ` Alex Mastro
2025-06-13 18:53     ` Peter Xu
2025-06-13 18:09   ` David Hildenbrand [this message]
2025-06-13 19:21     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6fbee39-a38f-4f94-bffb-938f7be73681@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=amastro@fb.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=peterx@redhat.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).