linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-media@vger.kernel.org" <linux-media@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Will Deacon" <will@kernel.org>
Subject: RE: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions
Date: Fri, 25 Jul 2025 05:34:40 +0000	[thread overview]
Message-ID: <IA0PR11MB71855A080F43E4D657A70311F859A@IA0PR11MB7185.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20250724054443.GP402218@unreal>

Hi Leon,

> Subject: Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO
> regions
> 
> > >
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > >
> > > Add support for exporting PCI device MMIO regions through dma-buf,
> > > enabling safe sharing of non-struct page memory with controlled
> > > lifetime management. This allows RDMA and other subsystems to
> import
> > > dma-buf FDs and build them into memory regions for PCI P2P
> operations.
> > >
> > > The implementation provides a revocable attachment mechanism using
> > > dma-buf move operations. MMIO regions are normally pinned as BARs
> > > don't change physical addresses, but access is revoked when the VFIO
> > > device is closed or a PCI reset is issued. This ensures kernel
> > > self-defense against potentially hostile userspace.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > ---
> > >  drivers/vfio/pci/Kconfig           |  20 ++
> > >  drivers/vfio/pci/Makefile          |   2 +
> > >  drivers/vfio/pci/vfio_pci_config.c |  22 +-
> > >  drivers/vfio/pci/vfio_pci_core.c   |  25 ++-
> > >  drivers/vfio/pci/vfio_pci_dmabuf.c | 321
> +++++++++++++++++++++++++++++
> > >  drivers/vfio/pci/vfio_pci_priv.h   |  23 +++
> > >  include/linux/dma-buf.h            |   1 +
> > >  include/linux/vfio_pci_core.h      |   3 +
> > >  include/uapi/linux/vfio.h          |  19 ++
> > >  9 files changed, 431 insertions(+), 5 deletions(-)
> > >  create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
> 
> <...>
> 
> > > +static int validate_dmabuf_input(struct vfio_pci_core_device *vdev,
> > > +				 struct vfio_device_feature_dma_buf
> *dma_buf)
> > > +{
> > > +	struct pci_dev *pdev = vdev->pdev;
> > > +	u32 bar = dma_buf->region_index;
> > > +	u64 offset = dma_buf->offset;
> > > +	u64 len = dma_buf->length;
> > > +	resource_size_t bar_size;
> > > +	u64 sum;
> > > +
> > > +	/*
> > > +	 * For PCI the region_index is the BAR number like  everything else.
> > > +	 */
> > > +	if (bar >= VFIO_PCI_ROM_REGION_INDEX)
> > > +		return -ENODEV;
> 
> <...>
> 
> > > +/**
> > > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
> > > + * regions selected.
> > > + *
> > > + * open_flags are the typical flags passed to open(2), eg O_RDWR,
> > > O_CLOEXEC,
> > > + * etc. offset/length specify a slice of the region to create the dmabuf
> from.
> > > + * nr_ranges is the total number of (P2P DMA) ranges that comprise the
> > > dmabuf.
> > Any particular reason why you dropped the option (nr_ranges) of creating
> a
> > single dmabuf from multiple ranges of an MMIO region?
> 
> I did it for two reasons. First, I wanted to simplify the code in order
> to speed-up discussion over the patchset itself. Second, I failed to
> find justification for need of multiple ranges, as the number of BARs
> are limited by VFIO_PCI_ROM_REGION_INDEX (6) and same functionality
> can be achieved by multiple calls to DMABUF import.
I don't think the same functionality can be achieved by multiple calls to
dmabuf import. AFAIU, a dmabuf (as of today) is backed by a SGL that can
have multiple entries because it represents a scattered buffer (multiple
non-contiguous entries in System RAM or an MMIO region). But in this
patch you are constraining it such that only one entry associated with a
buffer would be included, which effectively means that we cannot create
a dmabuf to represent scattered buffers (located in a single MMIO region
such as VRAM or other device memory) anymore. 

> 
> >
> > Restricting the dmabuf to a single range (or having to create multiple
> dmabufs
> > to represent multiple regions/ranges associated with a single scattered
> buffer)
> > would be very limiting and may not work in all cases. For instance, in my
> use-case,
> > I am trying to share a large (4k mode) framebuffer (FB) located in GPU's
> VRAM
> > between two (p2p compatible) GPU devices. And, this would probably not
> work
> > given that allocating a large contiguous FB (nr_ranges = 1) in VRAM may
> not be
> > feasible when there is memory pressure.
> 
> Can you please help me and point to the place in the code where this can
> fail?
> I'm probably missing something basic as there are no large allocations
> in the current patchset.
Sorry, I was not very clear. What I meant is that it is not prudent to assume that
there will only be one range associated with an MMIO region which we need to
consider while creating a dmabuf. And, I was pointing out my use-case as an
example where vfio-pci needs to create a dmabuf for a large buffer (FB) that
would likely be scattered (and not contiguous) in an MMIO region (such as VRAM).

Let me further explain with my use-case. Here is a link to my Qemu-based test:
https://gitlab.freedesktop.org/Vivek/qemu/-/commit/b2bdb16d9cfaf55384c95b1f060f175ad1c89e95#81dc845f0babf39649c4e086e173375614111b4a_29_46

While exhaustively testing this case, I noticed that the Guest VM (GPU driver)
would occasionally create the buffer (represented by virtio_gpu_simple_resource,
for which we need to create a dmabuf) in such a way that there are multiple
entries (indicated by res->iov_cnt) that need to be included. This is the main
reason why I added support for nr_ranges > 1 to this patch/feature.

Furthermore, creating multiple dmabufs to represent each range of the same
buffer, like you suggest IIUC is suboptimal and does not align with how dmabuf
works currently.

> 
> >
> > Furthermore, since you are adding a new UAPI with this patch/feature, as
> you know,
> > we cannot go back and tweak it (to add support for nr_ranges > 1) should
> there
> > be a need in the future, but you can always use nr_ranges = 1 anytime.
> Therefore,
> > I think it makes sense to be flexible in terms of the number of ranges to
> include
> > while creating a dmabuf instead of restricting ourselves to one range.
> 
> I'm not a big fan of over-engineering. Let's first understand if this
> case is needed.
As explained above with my use-case, having support for nr_ranges > 1 is not
just nice to have but absolutely necessary. Otherwise, this feature would be
constrained to creating dmabufs for contiguous buffers (nr_ranges = 1) only,
which would limit its effectiveness as most GPU buffers are rarely contiguous.

Thanks,
Vivek

> 
> Thanks
> 
> >
> > Thanks,
> > Vivek
> >
> > > + *
> > > + * Return: The fd number on success, -1 and errno is set on failure.
> > > + */
> > > +#define VFIO_DEVICE_FEATURE_DMA_BUF 11
> > > +
> > > +struct vfio_device_feature_dma_buf {
> > > +	__u32	region_index;
> > > +	__u32	open_flags;
> > > +	__u64	offset;
> > > +	__u64	length;
> > > +};
> > > +
> > >  /* -------- API for Type1 VFIO IOMMU -------- */
> > >
> > >  /**
> > > --
> > > 2.50.1
> >

  reply	other threads:[~2025-07-25  5:35 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23 13:00 [PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf Leon Romanovsky
2025-07-23 13:00 ` [PATCH 01/10] PCI/P2PDMA: Remove redundant bus_offset from map state Leon Romanovsky
2025-07-24  7:50   ` Christoph Hellwig
2025-07-23 13:00 ` [PATCH 02/10] PCI/P2PDMA: Introduce p2pdma_provider structure for cleaner abstraction Leon Romanovsky
2025-07-24  7:51   ` Christoph Hellwig
2025-07-24  7:55     ` Leon Romanovsky
2025-07-24  7:59       ` Christoph Hellwig
2025-07-24  8:07         ` Leon Romanovsky
2025-07-27 18:51         ` Jason Gunthorpe
2025-07-29  7:52           ` Christoph Hellwig
2025-07-29  8:53             ` Leon Romanovsky
2025-07-29 10:41               ` Christoph Hellwig
2025-07-29 11:39                 ` Leon Romanovsky
2025-07-29 13:15             ` Jason Gunthorpe
2025-07-29 16:12   ` Jason Gunthorpe
2025-07-23 13:00 ` [PATCH 03/10] PCI/P2PDMA: Simplify bus address mapping API Leon Romanovsky
2025-07-24  7:52   ` Christoph Hellwig
2025-07-23 13:00 ` [PATCH 04/10] PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation Leon Romanovsky
2025-07-23 13:00 ` [PATCH 05/10] PCI/P2PDMA: Export pci_p2pdma_map_type() function Leon Romanovsky
2025-07-24  8:03   ` Christoph Hellwig
2025-07-24  8:13     ` Leon Romanovsky
2025-07-25 16:30       ` Logan Gunthorpe
2025-07-25 18:54         ` Leon Romanovsky
2025-07-25 19:12           ` Logan Gunthorpe
2025-07-27  6:01             ` Leon Romanovsky
2025-07-27 19:05         ` Jason Gunthorpe
2025-07-28 16:12           ` Logan Gunthorpe
2025-07-28 16:41             ` Leon Romanovsky
2025-07-28 17:07               ` Logan Gunthorpe
2025-07-28 23:11                 ` Jason Gunthorpe
2025-07-29 20:54                   ` Logan Gunthorpe
2025-07-29 22:14                     ` Jason Gunthorpe
2025-07-30  8:03                     ` Leon Romanovsky
2025-07-29  7:52       ` Christoph Hellwig
2025-07-29  8:45         ` Leon Romanovsky
2025-07-27 19:02     ` Jason Gunthorpe
2025-07-23 13:00 ` [PATCH 06/10] types: move phys_vec definition to common header Leon Romanovsky
2025-07-23 13:00 ` [PATCH 07/10] vfio: Export vfio device get and put registration helpers Leon Romanovsky
2025-07-23 13:00 ` [PATCH 08/10] vfio/pci: Enable peer-to-peer DMA transactions by default Leon Romanovsky
2025-07-23 13:00 ` [PATCH 09/10] vfio/pci: Share the core device pointer while invoking feature functions Leon Romanovsky
2025-07-28 20:55   ` Alex Williamson
2025-07-29  8:39     ` Leon Romanovsky
2025-07-23 13:00 ` [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions Leon Romanovsky
2025-07-24  5:13   ` Kasireddy, Vivek
2025-07-24  5:44     ` Leon Romanovsky
2025-07-25  5:34       ` Kasireddy, Vivek [this message]
2025-07-27  6:16         ` Leon Romanovsky
2025-07-29 19:44   ` Robin Murphy
2025-07-29 20:13     ` Jason Gunthorpe
2025-07-30  9:32       ` Leon Romanovsky
2025-07-30 14:49       ` Robin Murphy
2025-07-30 16:01         ` Jason Gunthorpe
2025-07-30 19:58 ` [PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf Alex Williamson
2025-07-31  0:21   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=IA0PR11MB71855A080F43E4D657A70311F859A@IA0PR11MB7185.namprd11.prod.outlook.com \
    --to=vivek.kasireddy@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=robin.murphy@arm.com \
    --cc=sumit.semwal@linaro.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).