linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-media@vger.kernel.org" <linux-media@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Will Deacon" <will@kernel.org>
Subject: Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions
Date: Sun, 27 Jul 2025 09:16:15 +0300	[thread overview]
Message-ID: <20250727061615.GW402218@unreal> (raw)
In-Reply-To: <IA0PR11MB71855A080F43E4D657A70311F859A@IA0PR11MB7185.namprd11.prod.outlook.com>

On Fri, Jul 25, 2025 at 05:34:40AM +0000, Kasireddy, Vivek wrote:
> Hi Leon,
> 
> > Subject: Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO
> > regions
> > 
> > > >
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > >
> > > > Add support for exporting PCI device MMIO regions through dma-buf,
> > > > enabling safe sharing of non-struct page memory with controlled
> > > > lifetime management. This allows RDMA and other subsystems to
> > import
> > > > dma-buf FDs and build them into memory regions for PCI P2P
> > operations.
> > > >
> > > > The implementation provides a revocable attachment mechanism using
> > > > dma-buf move operations. MMIO regions are normally pinned as BARs
> > > > don't change physical addresses, but access is revoked when the VFIO
> > > > device is closed or a PCI reset is issued. This ensures kernel
> > > > self-defense against potentially hostile userspace.
> > > >
> > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > ---
> > > >  drivers/vfio/pci/Kconfig           |  20 ++
> > > >  drivers/vfio/pci/Makefile          |   2 +
> > > >  drivers/vfio/pci/vfio_pci_config.c |  22 +-
> > > >  drivers/vfio/pci/vfio_pci_core.c   |  25 ++-
> > > >  drivers/vfio/pci/vfio_pci_dmabuf.c | 321
> > +++++++++++++++++++++++++++++
> > > >  drivers/vfio/pci/vfio_pci_priv.h   |  23 +++
> > > >  include/linux/dma-buf.h            |   1 +
> > > >  include/linux/vfio_pci_core.h      |   3 +
> > > >  include/uapi/linux/vfio.h          |  19 ++
> > > >  9 files changed, 431 insertions(+), 5 deletions(-)
> > > >  create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
> > 
> > <...>
> > 
> > > > +static int validate_dmabuf_input(struct vfio_pci_core_device *vdev,
> > > > +				 struct vfio_device_feature_dma_buf
> > *dma_buf)
> > > > +{
> > > > +	struct pci_dev *pdev = vdev->pdev;
> > > > +	u32 bar = dma_buf->region_index;
> > > > +	u64 offset = dma_buf->offset;
> > > > +	u64 len = dma_buf->length;
> > > > +	resource_size_t bar_size;
> > > > +	u64 sum;
> > > > +
> > > > +	/*
> > > > +	 * For PCI the region_index is the BAR number like  everything else.
> > > > +	 */
> > > > +	if (bar >= VFIO_PCI_ROM_REGION_INDEX)
> > > > +		return -ENODEV;
> > 
> > <...>
> > 
> > > > +/**
> > > > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
> > > > + * regions selected.
> > > > + *
> > > > + * open_flags are the typical flags passed to open(2), eg O_RDWR,
> > > > O_CLOEXEC,
> > > > + * etc. offset/length specify a slice of the region to create the dmabuf
> > from.
> > > > + * nr_ranges is the total number of (P2P DMA) ranges that comprise the
> > > > dmabuf.
> > > Any particular reason why you dropped the option (nr_ranges) of creating
> > a
> > > single dmabuf from multiple ranges of an MMIO region?
> > 
> > I did it for two reasons. First, I wanted to simplify the code in order
> > to speed-up discussion over the patchset itself. Second, I failed to
> > find justification for need of multiple ranges, as the number of BARs
> > are limited by VFIO_PCI_ROM_REGION_INDEX (6) and same functionality
> > can be achieved by multiple calls to DMABUF import.
> I don't think the same functionality can be achieved by multiple calls to
> dmabuf import. AFAIU, a dmabuf (as of today) is backed by a SGL that can
> have multiple entries because it represents a scattered buffer (multiple
> non-contiguous entries in System RAM or an MMIO region). 

I don't know all the reasons why SG was chosen, but one of the main
reasons is that DMA SG API was the only one possible way to handle p2p
transfers (peer2peer flag).


> But in this patch you are constraining it such that only one entry associated with a
> buffer would be included, which effectively means that we cannot create
> a dmabuf to represent scattered buffers (located in a single MMIO region
> such as VRAM or other device memory) anymore. 

Yes

> 
> > 
> > >
> > > Restricting the dmabuf to a single range (or having to create multiple
> > dmabufs
> > > to represent multiple regions/ranges associated with a single scattered
> > buffer)
> > > would be very limiting and may not work in all cases. For instance, in my
> > use-case,
> > > I am trying to share a large (4k mode) framebuffer (FB) located in GPU's
> > VRAM
> > > between two (p2p compatible) GPU devices. And, this would probably not
> > work
> > > given that allocating a large contiguous FB (nr_ranges = 1) in VRAM may
> > not be
> > > feasible when there is memory pressure.
> > 
> > Can you please help me and point to the place in the code where this can
> > fail?
> > I'm probably missing something basic as there are no large allocations
> > in the current patchset.
> Sorry, I was not very clear. What I meant is that it is not prudent to assume that
> there will only be one range associated with an MMIO region which we need to
> consider while creating a dmabuf. And, I was pointing out my use-case as an
> example where vfio-pci needs to create a dmabuf for a large buffer (FB) that
> would likely be scattered (and not contiguous) in an MMIO region (such as VRAM).
> 
> Let me further explain with my use-case. Here is a link to my Qemu-based test:
> https://gitlab.freedesktop.org/Vivek/qemu/-/commit/b2bdb16d9cfaf55384c95b1f060f175ad1c89e95#81dc845f0babf39649c4e086e173375614111b4a_29_46

Ohh, thanks. I'll add nr_ranges in next version. I see that you are
using same region_index for all ranges and this is how I would like to
keep it: "multiple nr_ranges, same region_index".

Thanks

  reply	other threads:[~2025-07-27  6:16 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23 13:00 [PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf Leon Romanovsky
2025-07-23 13:00 ` [PATCH 01/10] PCI/P2PDMA: Remove redundant bus_offset from map state Leon Romanovsky
2025-07-24  7:50   ` Christoph Hellwig
2025-07-23 13:00 ` [PATCH 02/10] PCI/P2PDMA: Introduce p2pdma_provider structure for cleaner abstraction Leon Romanovsky
2025-07-24  7:51   ` Christoph Hellwig
2025-07-24  7:55     ` Leon Romanovsky
2025-07-24  7:59       ` Christoph Hellwig
2025-07-24  8:07         ` Leon Romanovsky
2025-07-27 18:51         ` Jason Gunthorpe
2025-07-29  7:52           ` Christoph Hellwig
2025-07-29  8:53             ` Leon Romanovsky
2025-07-29 10:41               ` Christoph Hellwig
2025-07-29 11:39                 ` Leon Romanovsky
2025-07-29 13:15             ` Jason Gunthorpe
2025-07-29 16:12   ` Jason Gunthorpe
2025-07-23 13:00 ` [PATCH 03/10] PCI/P2PDMA: Simplify bus address mapping API Leon Romanovsky
2025-07-24  7:52   ` Christoph Hellwig
2025-07-23 13:00 ` [PATCH 04/10] PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation Leon Romanovsky
2025-07-23 13:00 ` [PATCH 05/10] PCI/P2PDMA: Export pci_p2pdma_map_type() function Leon Romanovsky
2025-07-24  8:03   ` Christoph Hellwig
2025-07-24  8:13     ` Leon Romanovsky
2025-07-25 16:30       ` Logan Gunthorpe
2025-07-25 18:54         ` Leon Romanovsky
2025-07-25 19:12           ` Logan Gunthorpe
2025-07-27  6:01             ` Leon Romanovsky
2025-07-27 19:05         ` Jason Gunthorpe
2025-07-28 16:12           ` Logan Gunthorpe
2025-07-28 16:41             ` Leon Romanovsky
2025-07-28 17:07               ` Logan Gunthorpe
2025-07-28 23:11                 ` Jason Gunthorpe
2025-07-29 20:54                   ` Logan Gunthorpe
2025-07-29 22:14                     ` Jason Gunthorpe
2025-07-30  8:03                     ` Leon Romanovsky
2025-07-29  7:52       ` Christoph Hellwig
2025-07-29  8:45         ` Leon Romanovsky
2025-07-27 19:02     ` Jason Gunthorpe
2025-07-23 13:00 ` [PATCH 06/10] types: move phys_vec definition to common header Leon Romanovsky
2025-07-23 13:00 ` [PATCH 07/10] vfio: Export vfio device get and put registration helpers Leon Romanovsky
2025-07-23 13:00 ` [PATCH 08/10] vfio/pci: Enable peer-to-peer DMA transactions by default Leon Romanovsky
2025-07-23 13:00 ` [PATCH 09/10] vfio/pci: Share the core device pointer while invoking feature functions Leon Romanovsky
2025-07-28 20:55   ` Alex Williamson
2025-07-29  8:39     ` Leon Romanovsky
2025-07-23 13:00 ` [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions Leon Romanovsky
2025-07-24  5:13   ` Kasireddy, Vivek
2025-07-24  5:44     ` Leon Romanovsky
2025-07-25  5:34       ` Kasireddy, Vivek
2025-07-27  6:16         ` Leon Romanovsky [this message]
2025-07-29 19:44   ` Robin Murphy
2025-07-29 20:13     ` Jason Gunthorpe
2025-07-30  9:32       ` Leon Romanovsky
2025-07-30 14:49       ` Robin Murphy
2025-07-30 16:01         ` Jason Gunthorpe
2025-07-30 19:58 ` [PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf Alex Williamson
2025-07-31  0:21   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250727061615.GW402218@unreal \
    --to=leon@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=robin.murphy@arm.com \
    --cc=sumit.semwal@linaro.org \
    --cc=vivek.kasireddy@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).