All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: Matt Evans <matt@ozlabs.org>
Cc: "Alex Williamson" <alex@shazbot.org>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Alex Mastro" <amastro@fb.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Mahmoud Adam" <mngyadam@amazon.de>,
	"David Matlack" <dmatlack@google.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	linux-kernel@vger.kernel.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
	kvm@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v3 7/9] vfio/pci: Support mmap() of a VFIO DMABUF
Date: Fri, 12 Jun 2026 20:35:03 +0000	[thread overview]
Message-ID: <aixtd_7gDhf2kisJ@google.com> (raw)
In-Reply-To: <20260610154327.37758-8-matt@ozlabs.org>

On Wed, Jun 10, 2026 at 04:43:21PM +0100, Matt Evans wrote:

Hi Matt,

[...]

> +	 *
> +	 * With the goal of taking vdev->memory_lock in a world where
> +	 * vdev might not still exist:
> +	 *
> +	 * 1. Take the resv lock on the DMABUF:
> +	 *  - If racing cleanup got in first, the buffer is revoked;
> +	 *    stop/exit if so.
> +	 *  - If we got in first, the buffer is not revoked so vdev is
> +	 *    non-NULL, accessible, and cleanup _has not yet put the
> +	 *    VFIO device registration_.  So, the device refcount must
> +	 *    be >0.
> +	 *
> +	 * 2. Take vfio_device registration (refcount guaranteed >0
> +	 *    hereafter).
> +	 *
> +	 * 3. Unlock the DMABUF's resv lock:
> +	 *  - A racing cleanup can now complete.
> +	 *  - But, the device refcount >0, meaning the vfio_device
> +	 *    (and vfio_pcie_core device vdev) have not yet been
> +	 *    freed.  vdev is accessible, even if the DMABUF has been
> +	 *    revoked or cleanup has happened, because
> +	 *    vfio_unregister_group_dev() can't complete.
> +	 *
> +	 * 4. Take the vdev->memory_lock
> +	 *  - Either the DMABUF is usable, or has been cleaned up.
> +	 *    Whichever, it can no longer change under us.
> +	 *  - Test the DMABUF revocation status again: if it was
> +	 *    revoked between 1 and 4 return a SIGBUS. Otherwise,
> +	 *    return a PFN.
> +	 *  - It's not necessary to also take the resv lock, because
> +	 *    the status/vdev can't change while memory_lock is held.
> +	 *
> +	 * 5. Unlock, done.
>  	 */
> +
> +	dma_resv_lock(priv->dmabuf->resv, NULL);
> +
> +	if (priv->revoked) {
> +		pr_debug_ratelimited("%s VA 0x%lx, pgoff 0x%lx: DMABUF revoked/cleaned up\n",
> +				     __func__, vmf->address, vma->vm_pgoff);
> +		dma_resv_unlock(priv->dmabuf->resv);
> +		return VM_FAULT_SIGBUS;
> +	}
> +
> +	/* If the buffer isn't revoked, vdev is valid */
>  	vdev = priv->vdev;
>  
> +	if (!vfio_device_try_get_registration(&vdev->vdev)) {
> +		/*
> +		 * If vdev != NULL (above), the registration should
> +		 * already be >0 and so this try_get should never
> +		 * fail.
> +		 */
> +		dev_warn(&vdev->pdev->dev, "%s: Unexpected registration failure\n",
> +			 __func__);
> +		dma_resv_unlock(priv->dmabuf->resv);
> +		return VM_FAULT_SIGBUS;
> +	}
> +	dma_resv_unlock(priv->dmabuf->resv);
> +


>  	scoped_guard(rwsem_read, &vdev->memory_lock) {
> +		/* Revocation status must be re-read, under memory_lock */
>  		if (!priv->revoked) {
>  			int pres = vfio_pci_dma_buf_find_pfn(priv, vma,
>  							     vmf->address,

Wait, I noticed that the is_aligned_for_order() check from mainline was 
removed here. Was that intentional? 

For hugepage faults (order > 0), we must ensure the PFN and address are
properly aligned before calling vfio_pci_vmf_insert_pfn().

In the current upstream code, we have:
  if (is_aligned_for_order(vma, addr, pfn, order))

Should we restore that check here?

> @@ -1766,6 +1827,7 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf,
>  			    __func__, order, pfn, vmf->address,
>  			    vma->vm_pgoff, (unsigned int)ret);
>  
> +	vfio_device_put_registration(&vdev->vdev);
>  	return ret;
>  }
>  
> @@ -1774,7 +1836,7 @@ static vm_fault_t vfio_pci_mmap_page_fault(struct vm_fault *vmf)
>  	return vfio_pci_mmap_huge_fault(vmf, 0);
>  }
>  
> -static const struct vm_operations_struct vfio_pci_mmap_ops = {
> +const struct vm_operations_struct vfio_pci_mmap_ops = {
>  	.fault = vfio_pci_mmap_page_fault,

Nit: Instead of making this global, should we add a helper? E.g.:

void vfio_pci_set_vma_ops(struct vm_area_struct *vma)
{
     vma->vm_ops = &vfio_pci_mmap_ops;
}

[...]

> +
> +static int vfio_pci_dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> +{
> +	struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +
> +	/*
> +	 * If we observe that the buffer is revoked now then refuse
> +	 * the mmap().  This is a belt-and-braces early failure to
> +	 * ease debugging a revoked buffer being used.  Userspace
> +	 * might also race an mmap() against an explicit revocation,
> +	 * or an action doing a temporary revoke; race scenarios are
> +	 * still safe because the fault handler ultimately prevents
> +	 * access to a revoked buffer if it isn't caught here.
> +	 */
> +	if (READ_ONCE(priv->revoked))
> +		return -ENODEV;
> +	if ((vma->vm_flags & VM_SHARED) == 0)
> +		return -EINVAL;
> +
> +	/*
> +	 * dma_buf_mmap_internal() has asserted that the VMA is
> +	 * contained within the DMABUF size before calling this.
> +	 */
> +
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
> +
> +	/* See comments in vfio_pci_core_mmap() re VM_ALLOW_ANY_UNCACHED. */
> +	vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP |
> +		     VM_DONTEXPAND | VM_DONTDUMP);
> +	vma->vm_private_data = priv;
> +	vma->vm_ops = &vfio_pci_mmap_ops;
> +
> +	return 0;
> +}
>  #endif /* CONFIG_VFIO_PCI_DMABUF */
>  

Thanks,
Praan

  reply	other threads:[~2026-06-12 20:35 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10 15:43 [PATCH v3 0/9] vfio/pci: Add mmap() for DMABUFs Matt Evans
2026-06-10 15:43 ` [PATCH v3 1/9] PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE Matt Evans
2026-06-10 18:39   ` Leon Romanovsky
2026-06-11 16:07   ` Bjorn Helgaas
2026-06-11 17:44     ` Matt Evans
2026-06-11 18:37   ` Pranjal Shrivastava
2026-06-12  3:39     ` Tian, Kevin
2026-06-12 14:31       ` Matt Evans
2026-06-10 15:43 ` [PATCH v3 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs Matt Evans
2026-06-11 20:30   ` Pranjal Shrivastava
2026-06-12 17:37     ` Alex Williamson
2026-06-12 18:21       ` Pranjal Shrivastava
2026-06-12  8:42   ` Tian, Kevin
2026-06-10 15:43 ` [PATCH v3 3/9] vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA Matt Evans
2026-06-12  8:43   ` Tian, Kevin
2026-06-12  9:20   ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 4/9] vfio/pci: Convert BAR mmap() to use a DMABUF Matt Evans
2026-06-12  8:46   ` Tian, Kevin
2026-06-12 10:41   ` Pranjal Shrivastava
2026-06-12 15:22     ` Matt Evans
2026-06-12 19:43       ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 5/9] vfio/pci: Provide a user-facing name for BAR mappings Matt Evans
2026-06-12  8:46   ` Tian, Kevin
2026-06-12 14:06   ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 6/9] vfio/pci: Clean up BAR zap and revocation Matt Evans
2026-06-12 19:39   ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 7/9] vfio/pci: Support mmap() of a VFIO DMABUF Matt Evans
2026-06-12 20:35   ` Pranjal Shrivastava [this message]
2026-06-10 15:43 ` [PATCH v3 8/9] vfio/pci: Permanently revoke a DMABUF on request Matt Evans
2026-06-10 15:43 ` [PATCH v3 9/9] vfio/pci: Add mmap() attributes to DMABUF feature Matt Evans
2026-06-12  8:27 ` [PATCH v3 0/9] vfio/pci: Add mmap() for DMABUFs Tian, Kevin
2026-06-12 15:11   ` Matt Evans
2026-06-12 15:17     ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aixtd_7gDhf2kisJ@google.com \
    --to=praan@google.com \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn@kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dmatlack@google.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=matt@ozlabs.org \
    --cc=mngyadam@amazon.de \
    --cc=sumit.semwal@linaro.org \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.