From: Leon Romanovsky <leon@kernel.org>
To: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
linux-rdma@vger.kernel.org, Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v2 3/3] vfio/pci: Allow MMIO regions to be exported through dma-buf
Date: Mon, 24 Jun 2024 17:11:39 +0300 [thread overview]
Message-ID: <20240624141139.GH29266@unreal> (raw)
In-Reply-To: <20240624065552.1572580-4-vivek.kasireddy@intel.com>
On Sun, Jun 23, 2024 at 11:53:11PM -0700, Vivek Kasireddy wrote:
> From Jason Gunthorpe:
> "dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
>
> The patch design loosely follows the pattern in commit
> db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this
> does not support pinning.
>
> Instead, this implements what, in the past, we've called a revocable
> attachment using move. In normal situations the attachment is pinned, as a
> BAR does not change physical address. However when the VFIO device is
> closed, or a PCI reset is issued, access to the MMIO memory is revoked.
>
> Revoked means that move occurs, but an attempt to immediately re-map the
> memory will fail. In the reset case a future move will be triggered when
> MMIO access returns. As both close and reset are under userspace control
> it is expected that userspace will suspend use of the dma-buf before doing
> these operations, the revoke is purely for kernel self-defense against a
> hostile userspace."
>
> Following enhancements are made to the original patch:
> - Add support for creating dmabuf from multiple areas (or ranges)
> - Add a mmap handler to provide CPU access to the dmabuf
>
> Original-patch-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
> drivers/vfio/pci/Makefile | 1 +
> drivers/vfio/pci/dma_buf.c | 438 +++++++++++++++++++++++++++++
> drivers/vfio/pci/vfio_pci_config.c | 22 +-
> drivers/vfio/pci/vfio_pci_core.c | 20 +-
> drivers/vfio/pci/vfio_pci_priv.h | 23 ++
> include/linux/vfio_pci_core.h | 1 +
> include/uapi/linux/vfio.h | 25 ++
> 7 files changed, 525 insertions(+), 5 deletions(-)
> create mode 100644 drivers/vfio/pci/dma_buf.c
<...>
> +static int populate_sgt(struct dma_buf_attachment *attachment,
> + enum dma_data_direction dir,
> + struct sg_table *sgt, size_t sgl_size)
> +{
> + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> + struct vfio_region_dma_range *dma_ranges = priv->dma_ranges;
> + size_t offset, chunk_size;
> + struct scatterlist *sgl;
> + dma_addr_t dma_addr;
> + phys_addr_t phys;
> + int i, j, ret;
> +
> + for_each_sgtable_sg(sgt, sgl, j)
> + sgl->length = 0;
> +
> + sgl = sgt->sgl;
> + for (i = 0; i < priv->nr_ranges; i++) {
> + phys = pci_resource_start(priv->vdev->pdev,
> + dma_ranges[i].region_index);
> + phys += dma_ranges[i].offset;
> +
> + /*
> + * Break the BAR's physical range up into max sized SGL's
> + * according to the device's requirement.
> + */
> + for (offset = 0; offset != dma_ranges[i].length;) {
> + chunk_size = min(dma_ranges[i].length - offset,
> + sgl_size);
> +
> + /*
> + * Since the memory being mapped is a device memory
> + * it could never be in CPU caches.
> + */
> + dma_addr = dma_map_resource(attachment->dev,
> + phys + offset,
> + chunk_size, dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> + ret = dma_mapping_error(attachment->dev, dma_addr);
> + if (ret)
> + goto err;
> +
> + sg_set_page(sgl, NULL, chunk_size, 0);
> + sg_dma_address(sgl) = dma_addr;
> + sg_dma_len(sgl) = chunk_size;
> + sgl = sg_next(sgl);
> + offset += chunk_size;
> + }
> + }
> +
> + return 0;
> +err:
> + for_each_sgtable_sg(sgt, sgl, j) {
> + if (!sg_dma_len(sgl))
> + continue;
> +
> + dma_unmap_resource(attachment->dev, sg_dma_address(sgl),
> + sg_dma_len(sgl),
> + dir, DMA_ATTR_SKIP_CPU_SYNC);
> + }
> +
> + return ret;
> +}
> +
> +static struct sg_table *
> +vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
> + enum dma_data_direction dir)
> +{
> + size_t sgl_size = dma_get_max_seg_size(attachment->dev);
> + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> + struct sg_table *sgt;
> + unsigned int nents;
> + int ret;
> +
> + dma_resv_assert_held(priv->dmabuf->resv);
> +
> + if (!attachment->peer2peer)
> + return ERR_PTR(-EPERM);
> +
> + if (priv->revoked)
> + return ERR_PTR(-ENODEV);
> +
> + sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> + if (!sgt)
> + return ERR_PTR(-ENOMEM);
> +
> + nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size);
> + ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
> + if (ret)
> + goto err_kfree_sgt;
> +
> + ret = populate_sgt(attachment, dir, sgt, sgl_size);
One of the outcomes of the discussion over original Jason's series was
the decision do not use scatter-gather list, but provide DMA API
to be usable for non-struct page memory, which eliminates the need
of intermediate SG list.
This is why we had this series https://lore.kernel.org/all/cover.1709635535.git.leon@kernel.org
And new version is available here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dma-split-v1
Thanks
next prev parent reply other threads:[~2024-06-24 14:11 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-24 6:53 [PATCH v2 0/3] vfio/pci: Allow MMIO regions to be exported through dma-buf Vivek Kasireddy
2024-06-24 6:53 ` [PATCH v2 1/3] vfio: Export vfio device get and put registration helpers Vivek Kasireddy
2024-06-24 6:53 ` [PATCH v2 2/3] vfio/pci: Share the core device pointer while invoking feature functions Vivek Kasireddy
2024-06-24 6:53 ` [PATCH v2 3/3] vfio/pci: Allow MMIO regions to be exported through dma-buf Vivek Kasireddy
2024-06-24 10:02 ` Daniel Vetter
2024-06-24 14:11 ` Leon Romanovsky [this message]
2024-07-01 4:46 ` [PATCH v2 0/3] " Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240624141139.GH29266@unreal \
--to=leon@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=vivek.kasireddy@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox