Re: [PATCH v3 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Pranjal Shrivastava <praan@google.com>
To: Matt Evans <matt@ozlabs.org>
Cc: "Alex Williamson" <alex@shazbot.org>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Alex Mastro" <amastro@fb.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Mahmoud Adam" <mngyadam@amazon.de>,
	"David Matlack" <dmatlack@google.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	linux-kernel@vger.kernel.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
	kvm@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v3 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs
Date: Mon, 15 Jun 2026 15:07:55 +0000	[thread overview]
Message-ID: <ajAVSx9gPz1KkIbF@google.com> (raw)
In-Reply-To: <effb66ee-764b-4823-b6e4-c932a1f60f8d@ozlabs.org>

On Mon, Jun 15, 2026 at 03:27:01PM +0100, Matt Evans wrote:

Hi Matt,
> 
> On 11/06/2026 21:30, Pranjal Shrivastava wrote:
> > On Wed, Jun 10, 2026 at 04:43:16PM +0100, Matt Evans wrote:
> >> Add vfio_pci_dma_buf_find_pfn(), which a VMA fault handler can use to
> >> find a PFN.
> >>
> >> This supports multi-range DMABUFs, which typically would be used to
> >> represent scattered spans but might even represent overlapping or
> >> aliasing spans of PFNs.
> >>
> >> Because this is intended to be used in vfio_pci_core.c, we also need
> >> to expose the struct vfio_pci_dma_buf in the vfio_pci_priv.h header.
> >>
> >> Signed-off-by: Matt Evans <matt@ozlabs.org>

[...]

> >> +
> >> +	const unsigned long pagesize = PAGE_SIZE << order;
> >> +	unsigned long vma_off = ((vma->vm_pgoff - priv->vma_pgoff_adjust) <<
> >> +				 PAGE_SHIFT) & VFIO_PCI_OFFSET_MASK;
> >> +	unsigned long rounded_page_addr = ALIGN_DOWN(address, pagesize);
> >> +	unsigned long rounded_page_end = rounded_page_addr + pagesize;
> >> +	unsigned long page_buf_offset;
> >> +	unsigned long page_buf_offset_end;
> >> +	unsigned long range_buf_offset = 0;
> >> +	unsigned int i;
> >> +
> >> +	if (rounded_page_addr < vma->vm_start || rounded_page_end > vma->vm_end) {
> >> +		if (order > 0)
> >> +			return -EAGAIN;
> >> +
> >> +		/* A fault address outside of the VMA is absurd. */
> >> +		WARN(1, "Fault addr 0x%lx outside VMA 0x%lx-0x%lx\n",
> >> +		     address, vma->vm_start, vma->vm_end);
> > 
> > This could flood dmesg if triggered repeatedly by userspace :( 
> > Since a fault outside the VMA is an invalid access that already results
> > in a SIGBUS, we could probably avoid the WARN here?
> > Perhaps pr_warn_ratelimited() should suffice?
> 
> I'm OK moving to a pr_warn_ratelimited().  Note though that this case is
> "genuinely impossible" currently and the check exists in case something
> changes elsewhere.  (Re your flood comment, am I missing a way for
> userspace to trigger this?  The scenario is a faulthandler for a VMA
> getting a VA outside the bounds of that VMA; such a fault address
> wouldn't match that VMA.)

I should've been explicit, I guess I worded it wrong, my bad.
I didn't mean that a user-space could hit this on it's own. However,
I meant to say if there's a bug in core mm during some future work,
dmesg being flooded by stackdumps gets messy (especially during dev) as
the underlying reason might get missed in the flood. Hence, I prefer
moving to pr_warn_ratelimited.

> 
> >> +		return -EFAULT;
> >> +	}
> >> +
> >> +	/*
> >> +	 * page_buff_offset[_end] is the span of DMABUF offsets
> >> +	 * corresponding to the faulting page:
> >> +	 */
> >> +	if (unlikely(check_add_overflow(rounded_page_addr - vma->vm_start,
> >> +					vma_off, &page_buf_offset) ||
> >> +		     check_add_overflow(page_buf_offset, pagesize,
> >> +					&page_buf_offset_end)))
> >> +		return -EFAULT;
> >> +
> >> +	for (i = 0; i < priv->nr_ranges; i++) {
> >> +		size_t range_len = priv->phys_vec[i].len;
> >> +		phys_addr_t range_start = priv->phys_vec[i].paddr;
> >> +
> >> +		/*
> >> +		 * If the current range starts after the page's span,
> >> +		 * this and any future range won't match.  Bail early.
> >> +		 */
> >> +		if (page_buf_offset_end <= range_buf_offset)
> >> +			break;
> >> +
> >> +		if (page_buf_offset >= range_buf_offset &&
> >> +		    page_buf_offset_end <= range_buf_offset + range_len) {
> >> +			/*
> >> +			 * The faulting page is wholly contained
> >> +			 * within the span represented by the range.
> >> +			 * Validate PFN alignment for the order:
> >> +			 */
> >> +			unsigned long pfn = (range_start + page_buf_offset -
> >> +					     range_buf_offset) / PAGE_SIZE;
> > 
> > Minor nit: I'm aware that decent compilers convert pow(2) divides to >> 
> > However, we seem to be using `>> PAGE_SHIFT` across vfio-pci. E.g.:
> > 
> > return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff;
> > unsigned long pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
> > 
> > Let's consider using the same pattern?
> 
> (Do you know of a compiler that both builds the kernel and does NOT
> perform this transformation?  I am confident that resulting object code
> will be OK here.)
> 

I guess most of the modern compiler do, I was just referring to the
style across the file. I don't have any strong opinion.

> In an earlier revision I was using shifts but they were fairly messy
> compared to this expression, which arises from a request by Jason.
> 

Yea, looking back at Jason's comment [1], I think he was mainly pointing out
that the common factor (PAGE_SIZE) could be grouped together. But again,
no strong feeling about this, just picked up a pattern across the file. 
If it breaks on some compiler we can fix it later..

> >> +
> >> +			if (IS_ALIGNED(pfn, 1 << order)) {
> >> +				*out_pfn = pfn;
> >> +				return 0;
> >> +			}
> >> +			/* Retry with smaller order */
> >> +			return -EAGAIN;
> >> +		}
> >> +		range_buf_offset += range_len;
> >> +	}
> >> +
> >> +	/*
> >> +	 * A hugepage straddling a range boundary will fail to match a
> >> +	 * range, but the address will (eventually) match when retried
> >> +	 * with a smaller page.
> >> +	 */
> >> +	if (order > 0)
> >> +		return -EAGAIN;
> >> +
> >> +	/*
> >> +	 * If we get here, the address fell outside of the span
> >> +	 * represented by the (concatenated) ranges.  Setup of a
> > 
> > Nit: double space before "Setup" and "But" below.
> 
> I liked Alex's response :-)  This is common practice for monospaced text
> since increasing inter-sentence spacing helps readability in paragraph
> blocks (see Documentation/ for many examples ...).
> 

Ack. :)

> >> +	 * mapping must ensure that the VMA is <= the total size of
> >> +	 * the ranges, so this should never happen.  But, if it does,
> >> +	 * force SIGBUS for the access and warn.
> >> +	 */
> >> +	WARN_ONCE(1, "No range for addr 0x%lx, order %d: VMA 0x%lx-0x%lx pgoff 0x%lx, %u ranges, size 0x%zx\n",
> >> +		  address, order, vma->vm_start, vma->vm_end, vma->vm_pgoff,
> >> +		  priv->nr_ranges, priv->size);
> >> +
> >> +	return -EFAULT;
> > 
> > The fall-through logic at the end feels a bit redundant.
> > 
> > If we've exhausted the phys_vec list without finding a match, returning
> > -EAGAIN for order > 0 seems like the correct fallback behavior.
> 
> This path can happen (for order > 0) e.g. mis-alignment of VA versus the
> PFN, i.e. is likely...
> 
> > However, the subsequent WARN_ONCE for the order == 0 seems unnecessary?
> > An out-of-bounds access is an error that should simply return -EFAULT 
> > (converting to SIGBUS) without polluting the kernel log with stackdumps?
> 
> ...but the only way this can happen, for order == 0, is if the VMA
> extends beyond the underlying resource.  For example, if the VMA is
> larger than the DMABUF size (the total length of phys ranges set up
> inside the DMABUF).  Both VFIO BAR mmap() and a DMABUF mmap() disallow
> mapping off the end of the underlying resource.  That is, this also
> "cannot happen" but if logic changes elsewhere then we will really want
> to know about hitting this case -- the check is not redundant.

I didn't mean to imply that the path itself is impossible or won't 
happen.. I just meant that the logic / structure felt a bit redundant at
the end of the function..

Instead of having the separate `if (order > 0)` block falling through to
the base case, I suggest it could be cleaner as:

	ret = order ? -EAGAIN : -EFAULT;

	if (ret == -EFAULT)
		pr_warn_ratelimited(...);

	return ret;

But again, that's a preference. I'd leave that to your judgement.

> 
> Still, it doesn't need a regdump/backtrace (at least while this is only
> called from one spot), so a pr_warn_* is better.
> 

Ack.

Thanks,
Praan

next prev parent reply	other threads:[~2026-06-15 15:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10 15:43 [PATCH v3 0/9] vfio/pci: Add mmap() for DMABUFs Matt Evans
2026-06-10 15:43 ` [PATCH v3 1/9] PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE Matt Evans
2026-06-10 18:39   ` Leon Romanovsky
2026-06-11 16:07   ` Bjorn Helgaas
2026-06-11 17:44     ` Matt Evans
2026-06-11 18:37   ` Pranjal Shrivastava
2026-06-12  3:39     ` Tian, Kevin
2026-06-12 14:31       ` Matt Evans
2026-06-10 15:43 ` [PATCH v3 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs Matt Evans
2026-06-11 20:30   ` Pranjal Shrivastava
2026-06-12 17:37     ` Alex Williamson
2026-06-12 18:21       ` Pranjal Shrivastava
2026-06-15 14:27     ` Matt Evans
2026-06-15 15:07       ` Pranjal Shrivastava [this message]
2026-06-12  8:42   ` Tian, Kevin
2026-06-15 18:04     ` Matt Evans
2026-06-10 15:43 ` [PATCH v3 3/9] vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA Matt Evans
2026-06-12  8:43   ` Tian, Kevin
2026-06-12  9:20   ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 4/9] vfio/pci: Convert BAR mmap() to use a DMABUF Matt Evans
2026-06-12  8:46   ` Tian, Kevin
2026-06-15 15:33     ` Matt Evans
2026-06-12 10:41   ` Pranjal Shrivastava
2026-06-12 15:22     ` Matt Evans
2026-06-12 19:43       ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 5/9] vfio/pci: Provide a user-facing name for BAR mappings Matt Evans
2026-06-12  8:46   ` Tian, Kevin
2026-06-12 14:06   ` Pranjal Shrivastava
2026-06-15 15:13     ` Matt Evans
2026-06-10 15:43 ` [PATCH v3 6/9] vfio/pci: Clean up BAR zap and revocation Matt Evans
2026-06-12 19:39   ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 7/9] vfio/pci: Support mmap() of a VFIO DMABUF Matt Evans
2026-06-12 20:35   ` Pranjal Shrivastava
2026-06-10 15:43 ` [PATCH v3 8/9] vfio/pci: Permanently revoke a DMABUF on request Matt Evans
2026-06-10 15:43 ` [PATCH v3 9/9] vfio/pci: Add mmap() attributes to DMABUF feature Matt Evans
2026-06-12  8:27 ` [PATCH v3 0/9] vfio/pci: Add mmap() for DMABUFs Tian, Kevin
2026-06-12 15:11   ` Matt Evans
2026-06-12 15:17     ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajAVSx9gPz1KkIbF@google.com \
    --to=praan@google.com \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn@kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dmatlack@google.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=matt@ozlabs.org \
    --cc=mngyadam@amazon.de \
    --cc=sumit.semwal@linaro.org \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.